In the previous lesson we discussed how crawler-based engines work. Typically, special crawler software visits your site and reads the source code of your pages. This process is called "crawling" or "spidering". Your page is then compressed and put into the search engine's repository which is called an "index". This stage is referred to as "indexing". Finally, when someone submits a query to the search engine, it pulls your page out of the index and gives it a rank among the other results it has found for this query. This is called "ranking".
Usually for indexing, crawler-based engines consider many more factors than those they can find on your pages. Thus, before putting your page into an index, a crawler will look at how many other pages in the index are linking to yours, the text used in links that point to you, what the PageRank is of linking pages, whether the page is present in directories under related categories, etc. These "off-page" factors are a significant consideration when a page is evaluated by a crawler-based engine. While theoretically, you can artificially increase your page relevance for certain keywords by adjusting the corresponding areas of your HTML code, you have much less control over other pages in the Internet that are linking to you. Thus, off-page relevance prevails in the eyes of a crawler.
In this lesson, we look at the main spider-based search engines, and learn how we can get each of them to index our site and rank it highly. Although this step does not closely deal with the optimization process itself, we provide information on how each search engine looks at your pages so that you can come back to this section for later reference.
Google is the number one search engine among such giants of the SEs' market as Yahoo! and Live Search. Its search share is over 60%. Google indexes billions of Web pages, so that users can search for the information they desire. Plus it creates services and tools including Web applications, advertising networks and solutions for businesses to hold on the leading position successfully.
You can submit your site to Google here: http://www.google.com/addurl/ and your site will probably be indexed in around 1-2 months.
Alternatively, you can sign in to your Google account, go to Google Webmaster Tools and submit a sitemap of your site. For more info on creating sitemaps, please refer to Lesson "Creating a Search Engine Friendly Site Map".
Please keep in mind that Google may ignore your submission request for a significant period of time. Even if it happens to crawl your site, it may not actually index it if there are no links pointing to it. However, if Google finds your site by following the links from other pages that have already been indexed and are regularly re-spidered, chances are you will be included without any submission. These chances are much higher if Google finds your site by reading a directory listing, such as DMOZ (www.dmoz.org).
So submit your site and it may help, but links are the best way to get indexed.
In the past, Google typically performed monthly updates called the "Google Dance" among the experts. At the beginning of the month, a deep crawl of the web took place, then after a couple of weeks the PageRank for the retrieved pages was calculated, and at the end of the month the index database was finally updated. Nowadays, Google has switched to an incremental daily update model (sometimes referred to as everflux) so the concept of Google dance is quickly becoming historical.
The "Dance" took place from time to time but only when they need to make major changes to their algorithm. For example, their dance in November 2003 (known as Google Florida Update) was actually their first after about six months. In January 2004, Google started another dance (Austin Update) where pages that had disappeared during the "Florida" showed up again, and many pages that hadn't disappeared the first time were gone.
In February 2004 Google updated once more and things settled down. Most people who had lost pages saw them return and although the results were rather different than those shown before Florida, at least pages didn't seem to disappear for no reason.
Google claims to have 1 trillion (as in 1,000,000,000,000) unique URLs in its index. The engine constantly adds new pages to the index database - usually it takes around two days to list a new page after the Googlebot (Google's spider) has crawled it. The Google team works industriously towards algorithm perfection to keep their leading position amongst search engines.
These days, Google maintains a database which is continuously updated. Matt Cutts (head
of Google's Webspam team) reported in his personal blog that : «Google switched to an index that was incrementally updated every day (or faster). Instead of a monolithic monthly event, the Google would refresh some of its index pretty much every day, which generated much smaller day-to-day changes that some people called everflux.»
Google has lots of so-called "regional" branches, such as "Google Australia", "Google Canada," etc. These branches are modifications of their index database stored on servers located in the corresponding regions. They are meant to further adjust search results to searcher needs: when you're searching, Google detects your IP address (and thus approximate location) and feeds the results from the most appropriate index database.
Submission to the "Main Google" will list your site in all its regional branches - after Google indexes you, of course.
Google has a number of crawlers to do the spidering. They all have the name "GoogleBot" but they come from a number of different IP addresses. You can see if Google has visited your site by looking through your server logs: just find the IP address something like 82.110.xxx.xx and most probably you will see the user-agent defined as GoogleBot ("Googlebot/2.1+(+http://www.google.com/bot.html)"). Verify that ip address using the reverse DNS lookup to find out the registered name of the machine. Generally all Google crawlers host names will end with 'googlebot.com'
Google is by far the most important search engine. Apart from their own site receiving 350 million searches per day, they also provide the search results for AOL Search, Netscape Search, Ask.com, Iwon, ICQ Search and MySpace Search. For this reason, most optimizers first focus on Google. Generally, this makes sense.
How to optimize for Google
Most important for Google are three factors: PageRank, link anchor text and semantics.
PageRank is an absolute value which is regularly calculated by Google for each page it has in its index. Later in this course we will give you a detailed description, but for now it's just important to know that the number of links you've got from other sites outside your domain matters greatly, as well as the link quality. The latter means that in order to give you some weight, the sites linking to yours must themselves have high PageRank, be content-rich and regularly updated.
MiniRank/Local Rank is a modification of the PageRank based on the link structure of your single site only. Since search engines rank pages, not sites, certain pages of your site will rank higher for given keywords than others. Local Rank has a significant influence on the general PageRank.
Anchor text is the text of the links that point to your pages. For instance, if someone links to you with the words "see this great website", this is a useless link. However, let's say you sell car tyres and a link from another site to yours says "car tyres from leading brands", such a link will boost your rank when someone searches for car tyres on Google.
Semantics is a new factor that appears to have made the biggest difference to the results. This term refers to the meaning of words and their relationships. Google bought a company called Applied Semantics back in 2003 and has been using the technology for their AdSense contextual advertising program. According to the principles of applied semantics, the crawler attempts to define which words mean the same thing and which ones are always used together.
For example, if there are a certain number of pages in Google's index saying that an executive desk is a piece of office furniture, Google associates the two phrases. After this, a page about executive desks using the keywords "office furniture" won't show up in a search for the keywords ''executive desk". On the other hand, a page that mentions "executive desk" will rank better if it mentions "office furniture".
Now, there are two other terms related to Google's way of ranking pages: Hilltop and Sandbox.
Hilltop is an algorithm that was created in 1999. Basically, it looks at the relationship between "Expert" and "Authority" pages. An "Expert" is a page that links to lots of other relevant documents. An "Authority" is a page that has links pointing to it from the "Expert" pages.
In theory, Google would find "Expert" pages and then the pages that they link to would rank well. Pages on sites like Yahoo, DMOZ, college sites and library sites would be considered experts.
Sandbox refers to Google's algorithm which detects how old your page is and how long ago it has been updated. Usually pages with stale content tend to gradually slip down the result list, while new pages just crawled initially have higher positions than they would if based on PageRank only. However, some time after gaining boosted positions, new website disappear from the top places in search results, since Google wants to verify whether your website is really continued and was not created with the sole purpose to benefit from artificially high rankings over the short term. The period when a website is unable to make it to the top of search results is referred to as "being in the sandbox". This can last from 6 months to one year, then the positions usually restore gradually. However, not all brand new site owners observe the sandbox effect on their own sites, which has led to discussions on whether the sandbox filter really exists.
On-page factors considered by Google
Now that we've examined off-page factors that have primary importance for Google, let's take a look at on-page factors that should be given attention before submitting to Google.
Google does not consider the META keyword tag for counting relevancy. While your META description tag contents can be used by Google as the description of your site in the search results, the META description does not have any influence for relevancy count.
Nowadays META tags don't influence website position in search results absolutely. They can be of use as additional information source about the Web page for surfers only.
When targeting optimization for Google, be sure to use your keywords in the following:
- Your domain name - important!
- First words of the TITLE tag; HTML heading tags H1 to H6;
- ALT text as long as you also describe the image;
- Quality content on your index page. Try to make the length of your home page at least 300 words, however, don't hide anything from visitors' eyes (VERY IMPORTANT!).
- Link text for outgoing links.
- Drop-down form boxes created with the help of the SELECT tag.
- Finally, try to have some keywords in BOLD.
Additionally, try to center your pages around one central theme. Use synonyms of your important keyword phrases. Keep everything on the page on that ONE main topic, and provide good, solid content.
Pages that are optimized for Google will score best when there are at least a few links to outside sites that are related to your topic because this establishes your page's reputation as an authority. Google also measures how many websites outside your domain have links pointing to your site and factor in an "importance rating" for each of those referring sites. The more popular a site appears to a search engine, the higher up in the search listings they will place it.
According to Craig Silverstein with Google,
"External links that you grant from a particular page on your website can become diluted. In other words, if you place 10,000 links to other Web pages from a particular page of your website, each link is less powerful than if you were to link to only five other Web pages. Or, the contribution value to another website of each individual link is weakened the more you grant."
You can submit your URL to Yahoo! Search for free here: http://submit.search.yahoo.com/free/request (note: you need to register at their portal first) and it will be indexed in about 1-2 months.
Yahoo! is still the most popular site on the Web, according to its traffic rank reported by Alexa (www.alexa.com). Nevertheless, in terms of the number of searches performed Google carries the day.
Yahoo! provides results in a number of ways. First, it has one of the most complete directories on the Web. There's also Yahoo! Search, which lists results in a way similar to other crawler-based engines. Here, in this section on crawler-based engines, we deal with the second service.
Sponsored results are found at the top, side, and bottom of the search results pages fed by Yahoo!. Yahoo! now owns its Yahoo! Search Marketing pay engine and provides search results to AltaVista, AllTheWeb and Lycos.
The search results at Yahoo! changed in February 2004. For the previous couple of years, Google was their search-results supplier. Nowadays, Yahoo! is using its own database. Yahoo! bought engines that had earlier pioneered the search world, Yahoo! Search officially provides all search engines acquired through these acquisitions with its own results. Therefore, when you optimize your Web pages for Yahoo!, there's a good chance of appearing in the top results of other popular search engines, such as AllTheWeb and AltaVista.
To find out if Yahoo's spider has visited your site, search the following information in your server logs. Their crawler is now called Yahoo Slurp (formerly, it was just Slurp). For each request from 'Yahoo! Slurp' user-agent, you can start with the IP address (i.e. 74.6.67.218). Then check if it really is coming from Yahoo! Search using the reverse DNS lookup. The name of all Yahoo! Search crawlers will end with 'crawl.yahoo.net,'.
To rank well in Yahoo, you need to do the same things that help your rankings in Google. Off-page factors (link popularity, anchor text, etc.) are very important. Some experts consider it easier to rank well on Yahoo because your own internal links are more important and there also appears to be no requirement for link relevancy. Whereas Google claims that the PageRank of the relevant linking sites is worth more than the PageRank of irrelevant sites, links don't need to be relevant to do well on Yahoo.
Like all other search engines, they'll list you for free if you get links to your site. Much like Google, their crawler is very active and updates listings on a daily basis. However, it can take a few weeks for Yahoo to list new pages after they have found and crawled through referring links. The pages that have already been included in the listing are updated much more often, usually every several days.
In March 2004, Yahoo launched its paid inclusion program called Site Match. You can find out details about Yahoo's paid inclusion program here: http://searchmarketing.yahoo.com/srch/index.php and the pricing here - http://searchmarketing.yahoo.com/srchsb/sse_pr.php?mkt=us
Site Match guarantees your site will appear in non-sponsored search results on Yahoo!, and other portals such as AltaVista and AllTheWeb.
Let's review some quick insights into the factors on your pages that will help you rank higher in Yahoo: use keywords in your domain name and use keywords in the TITLE tag. Your title must include your most important keyword phrase once toward the beginning of the tag. Don't split the important keyword phrase in your title!
META Keywords and META Description tags.
The Yahoo! family of search engines DOES NOT consider META tags when estimating relevancy. Use keywords in heading tags H1 through H6; Use keywords in link anchor text and ALT attributes of your images; the body main content and page names (URLs) need to have keywords in them too; the recommended keyword weight in the BODY tag for Yahoo is 5%, maximum 8%. A catalog page with lots of links, for instance a site map, will help a lot for your indexing and ranking by Yahoo!.
You can submit just the main page to Yahoo!, and let its spider find, crawl, and index the rest of your pages. If it doesn't find an important page, however, make sure you submit it manually.
Like with all of the other engines, solid and legitimate link popularity is considered important by Yahoo's spider as a ranking factor.
Yahoo! frowns upon having satellite sites that revolve around the theme of a main site. For example, if you sell office furniture and set up a main company site and then plant several satellite sites for each kind of furniture, it may seem suspicious to Yahoo!. Therefore, make sure that each site is a stand-alone site and serves a unique purpose, and that it's valuable to both the search engines and your users.
As with any other search engine it is vitally important for Yahoo! that you create valuable content for your search visitors.
You can submit your site to Live Search (formerly known as MSN Search) for free at http://search.msn.com/docs/submit.aspx, however they are sure to find it without your submission if you have links from sites already listed there.
MSN stands for Microsoft Network and was initially meant to be Microsoft's solution for Web search, among other goals. Nevertheless, it was powered by Inktomi's results and did not have its own crawler.
Since February 2005, MSN switched from another engine result base and introduced its own Web crawler. And their next significant step was on September 11, 2006 when Live Search release replaced MSN Search. For this moment Live Search is one of the most popular of world-class search engines, with around 11% of all search traffic. It definitely makes sense to target placement at the top of Live Search's listings as the amount of traffic you will receive as a result is considerable. However, with Live Search it's especially important to avoid spam methods since they claim to use a sophisticated series of technologies to fight against even potential spammers. PC magazine has published an article that states:
"Spammers are increasingly trying to weasel their way into search engine results, and Microsoft hopes that filtering them out can be one area where its tool can outshine Google's."
For more information on this article, visit http://www.pcmag.co.uk/news/1155758.
The information for finding the Live Search spider in your server visit logs is as follows: The Spider's name is MSNBot, with the IP addresses something like 65.55.xx.xx and 65.64.xx.xx, host msnbot.msn.com, and user agent "msnbot/1.1 (+http://search.msn.com/msnbot.htm)".
At this stage, the general rules of optimizing for Live Search would be similar to the optimization rules for other search engines. Get your site listed in the directories, obtain solid and quality link popularity, balance your keyword theme. You may also consider purchasing an ad from Live Search to get listed in the sponsored results. Live Search equally treats both off-page and on-page factors when ranking pages.
FAST (now called Fast Search and Transfer) is a Norwegian company that specializes in search and filter technology solutions. Some time ago it built the AllTheWeb search engine. In 2003, AltaVista and AllTheWeb were bought out by the Overture engine, the second being bought from FAST Search. Overture, in its turn, was purchased by Yahoo!. It is through this route that AllTheWeb has joined the Yahoo! family of search engines. As a result, AllTheWeb's spider is no longer crawling the Web, and you can no longer submit to the engine. Its XML feed and paid inclusion programs have been changed over to Yahoo! Search Marketing programs. Submitting to Yahoo! Search (and getting indexed there) will get your pages in AltaVista, Yahoo! Web Results, AllTheWeb, Live Search, HotBot and other engines.
AltaVista (http://www.altavista.com) was once a big player in the search industry. Until about seven years ago they could claim to be the most used search engine. Since that time, this engine has lost its independence and much of its popularity. Nowadays, it's sending very little traffic to websites. As with AllTheWeb, it was purchased by Yahoo together with Overture.
Ask.com (formerly Ask Jeeves) is the last crawler-based search engine we are going to speak about. Ask.com supplies its search results to the Iwon search engine and is a member of the Google family of search engines.
It was designed as Ask Jeeves, where "Jeeves" is the name of the "gentleman's personal gentleman", or valet (illustrated by Marcos Sorensen), fetching answers to any question asked. On September 23, 2005 the company announced plans to phase out Jeeves and on February 27, 2006 the character was disassociated with Ask.com (according to the Wikipedia).
Ask.com has acquired DirectHit and Teoma which were big players in the search industry too. DirectHit was a search engine that provided results based on click popularity. Therefore, sites that received the most clicks for a particular keyword were listed at the top. Teoma was also unique due to its link popularity algorithm. Teoma claimed that it produces well ordered and relevant search results by initially eliminating all the irrelevant sites and then considering the popularity of only those that relate to the search subject in the first place.
Optimizing for Google will guarantee your appearance on the Ask.com results along with the other large engines of Google's family (AOL, Netscape and Iwon).