How Search Engines Work
Before we get into the details of how search engines work, if you got here from my website Improved Search Engine Rank, you can return there by clicking on the 'Seocious' link to the left once you have finished with this page.
Initially it may be useful if you know the difference between the ‘internet’ and the ‘web’ since many people use the terms synonymously (i.e. as being the same thing). The difference is explained HERE. If you understand the difference, ignore this link.
If you reached this page from my SEO - Seocious page, then be sure to return by clicking on the link to the left.
If no-one can find your website, it may as well not exist. It must be visible on the web and there are a number of ways of achieving this. The first, which I will not dwell on, is direct advertising on the internet, i.e. pay-per-click advertising (PPC), whereby you place an advert, usually on a search engine results age, using an agency such as Google or Yahoo and pay a certain bid amount every time a visitor clicks on your ad. You pay whether they buy anything or not, and it can either be very successful for you or can cost you a great deal of money.
How Search Engines Work
Search engines and web directories used to be distinctly different entities, but are now drawing closer to each other as perceived by the user. A search engine works by searching the whole web using software specifically designed for the purpose called ‘spiders’ or ‘crawlers’. They publish their list of findings on the screen, and users decide which sites they want to visit. The search engines list their results in the order that they have calculated the relevant web pages to best meet the search terms used by their customers searching for information.
These listings are of individual web pages, not complete websites. Because search engines work by comparing the vocabulary on your page with that used by the customer, it is possible to have a very high listing, generally defined as being in the top 10 results, for a particular search term for one of your pages, and another page in the same website to be very low, or not appear at all. If a search engine feels that the content of your page is not relevant to the search terms used by the searcher, it will not list your page for that term (sometimes called ‘keyword’), even though it could be listed top for another search term.
There are many factors that affect where in the listings your page appears, and these tend to be changed frequently by the major search engines.
Whenever you change your web page, the crawlers will eventually detect the change and adjust your position in the results page accordingly. According to the relevance of your change to the semantic terms (words) used by the person making the search, this change could be up or down.
The search engine of a website directory works by searching the sites listed in that directory, rather than all the pages connected to the World Wide Web.
With a web directory you manually submit a description of your website along with the URL. Directories generally deal with complete websites rather than individual pages. An example is dmoz.org known as the Open Directory. Your site is visited by human editors who decide if it is unique and relevant enough to be included in the directory. If there are other websites containing the same information as yours, you will find it difficult to be included in Dmoz unless you are providing something different. You have to have something unique to offer to visitors.
Other directories will include most sites covering a particular subject, as long as it meets specific criteria such as content (for example no gambling or adult sites), and there are many paid directories where you pay a regular subscription to ensure that your website is included.
The directory search is limited to the website descriptions, not the content of the site itself, hence the importance of editors to make sure that the actual content fits the description. A visitor does not want to search for information on ‘writing articles’ and be sent to a website with ‘writing articles’ in the description, but the actual site full of adult content. This kind of deception occurs more often than you might think.
Some engines, such as Google, are pure ‘search engines’. They do not list any directory content, except where a page has sufficient relevance to be listed in its own right. Yahoo began as a pure directory, then started to show organic search engines results using Google as a provider, until in 2004 Yahoo started up its own search engine. The main Yahoo Search results are now web based rather from the Yahoo directory.
I won't go too far into this topic, because the situation with the various engines and directories change almost daily. The objective is to explain how search engines work and how the use of good content can enhance your site's visibility. If no one can find your site, it may as well not exist.
The Components Of A Search Engine
A search engine consists of three major components. These are:
- The Crawler or Spider
Frequently called a spider, the crawler is a piece of software that analyzes the contents of web page and compares it to the keywords or search term used by the searcher. It follows any links provided on the page and assesses the relevance of these links to the content of the page. You can enter HTML text to ensure that spiders do not follow any link that you do not want them to do. To do this enter the following after the URL statement in the HTML: rel=nofollow. If you add this inside the HTML tag – just before the last chevron ‘>’ is a good place, the spider will ignore the link.
The crawler will visit your site on a regular basis and it is wise to keep your content fresh and updated. You will be listed higher if your content is regularly refreshed or added to so that the spider can see that it is not stagnant between visits.
- The Index
The results of the spiders search are put into the index. It is simply an index of the web pages found relating to the search term used. It is possible for your web page to have been visited by the spider, but is not yet included in the index. It will not be included if the spider software doe not consider your site sufficiently relevant to the search term, or keyword, used and is unlikely to be included if you site is a copy of one already indexed, or its content is similar.
The best chance your page has of being included, especially in a high position, is if you have unique content to offer anyone any searching for information on a specific keyword.Your content should also be up to date and regularly refreshed. If you have few pages indexed in a search engine such as Google, you will be unlikely to be included in free directories such as Dmoz since your site will not be different enough from all the other sites on the same subject to warrant inclusion.
The Search Engine Software
This is the coding that searches all the pages in the index and list them in order of decreasing relevance to the search term being used by the user. Google, like most other search engines, wants to satisfy its customers, and not present them with a gaggle of web pages irrelevant to or all with the same information on what they are seeking. Unfortunately, I think Google are failing due to the undue emphasis they currently give to back-links, but that will come later.
The more specific a searcher is with the keywords used, the more relevant will be the information provided by the search engines. One of my websites is dedicated to childhood diseases, and would be difficult to find with a search simply saying ‘diseases’. ‘childhood diseases’ would show my website home page currently at position 42, and a search for ‘cause of measles’, one of my specialties, would show my site at position 5 in Google, and 1 in Yahoo (at the time of writing this). An even more specific search for ‘Kopliks Spots’, which are the definitive diagnosis of measles, finds my page on this symptom at Number 1 in Google, Yahoo and MSN.
So, the more specific you make a search, the more specific will be the information, and if you have a website, specific pages on your site giving good information on certain aspects of the subject are liable to be listed higher than your home page.
How Search Engines Rank Your Pages
Search engines use an algorithm, which is a mathematical set of rules by which every website on the world wide web is tested against the keywords being used for the search.
At one time, the frequency of use of the keywords was the major consideration. In my example above, if the term ‘Koplik’s spots’ appeared more frequently on your page than on another, you would be listed higher. You could make sentences containing nothing but Koplik’s spots, and this would fool the search engines.
No longer. Google and most other search engines are now using semantics to establish the theme of your web page and how relevant it is to the search term being used by the visitor. This is popularly called LSI, but LSI is a statistical concept devised by mathematicians that is believed to form part of Google's search algorithm, but is not really understood by anyone other than statistical mathematicians. Click on LSI to the left for details of what the term means and how it affects your website content.
A short while ago, up until mid 2006 or even later, it was advised that the ideal keyword density in your text should be between 1% and 3% of the total number of words. Towards the end of 2006 it became apparent that sites with the keyword in the title, in the first sentence or two of the text and in the last sentence were starting to rise up the listings while those with higher KDs were dropping.
My advice for what is worth, though it works well for me, is that the keyword should be located as I have just described, and the rest of the text contain semantically related words. This is explained better on my LSI Page, but basically don’t overdo phrases such as ‘cause of measles’, even if that is the topic of your webpage. Sprinkle terms like ‘though children can contract measles through coughs and sneezes, these are not the actual cause of the disease. Measles are caused by a virus called . . .’
Search engines now use character and word analysis to spot words semantically related to the words ‘cause’, ‘of’ and ‘measles’, and give weighting to how these are used in the text. It is an actual mathematic equation that is used. Obviously the word ‘of’ is not too relevant, but the equation determines this, not a human being.
If you read that frequency of a keyword in your text is very important, it is correct. However, not as important as it used to be. It is the frequency of semantically related words that is now important. This is the way that the search engines now sniff out the web pages that contain little content other than the keyword. These pages that tend to be generated by software containing text such as: “Measles is a popular topic and information on measles can easily be found on the internet. In fact measles is so popular that most search engines will throw up many pages on different types of measles. If you want information on measles there is plenty on this website, and the information superhighway is a great way of finding information on measles.”
You can replace the word ‘measles’ with any other keyword you can think of, and page generation software uses a template such as this, replacing the keyword with thousands upon thousands of others to generate thousands of webpages at the click of a button. These are then added to a web page template with Adsense blocks and published on the web. In the past they would have been listed high by the spiders due their excellent keyword frequency.
However, the information they give on the subject is nil! These sites can now be identified, since the algorithm is looking for content related to ‘measles’ other than just the keyword itself. If you use the keyword too much you will be punished. Although popularly termed LSI by most internet users involved in web page optimization, the term is not correct in the way it is used.
This is not an SEO website, but here are some clues on how to use keywords.
Spiders look at the top left hand text on your web page first, then work from left to right down to the bottom. The first text it should see should be your main keyword as a title, contained in H1 html tags
If you have tables and columns, the spiders will look at the content of the top left column first. If you have a main table with an H1 heading, then two columns, one for a left hand navigation column and the next with your main content, the spider will see your H1 heading first, then the first column, which is your navigation column. It will look at everything in this column before switching back to the top of the next column which contains your content.
This is not good since the search engine should find your main content first. You could place your navigation column to the right, but if you want it on the left you start with an empty column, aligned left, then your main content column, aligned right, then another column aligned left placed immeditaely below your left hand empty column.
The spider will read the empty column, then the first text it sees after your heading will be your content. Having reached the bottom right of you content column, it will look for the next column on the left which is your navigation. That is how my pages on this website have been designed.
Search engines which rely on crawlers have cottoned on to the way many webmasters alter their content to keep the spiders happy. The content itself does not change, just the order of words and synonyms. Many engines, Google in particular (and they are the largest so we must create our websites with Google in mind), now use link analysis as a significant factor in listing placement. Not only links between pages of the same website, but also links between different websites. The more back-links you have from other sites with content relevant to yours, the more important your site will be regarded.
The provision of back-links, however, is now big business and many site have links to and from other sites which are not relevant to each other and negotiate links just for site placement rather than for the benefit of the visitor. Many of these links are contained on single ‘links’ pages containing hundreds of website links without any search facility to allow the visitor to find sites relevant to their needs.
Google is aware if this, and links to and from web pages that have little relevance to each other are no longer rewarded, and might in fact be punished. Not only does Google calculate the relevance of links but also their authority.A link from a page that itself has many incoming links from authority sites will be regarded as worth more than one from a site with few incoming links.
How search engines work with linking strategies can be thought out with a bit of reasoning: the more authority the page linking to you has, then the more valuable to your PageRank is the link. The better or more relevant to your topic the on-page SEO of that page is, the more value the link from it is to you, and the more incoming links it has from authority websites on your topic, the more valuable the link.
In fact, links have become so important with Google that they supersede the need for good content, and many sites with no useful relevant content whatsoever sit at Number 1 position in the listings for specific keywords. This is extremely annoying, since links should never be regarded above content, but that is Google's current position. The advantage of good relevant links is diluted if there is little content in the site.
Nevertheless, focussing on both will give best results, so click for information on Latent Semantic Indexing that Google and other search engines may be using as the basis for their algorithms to determine page relevance.
This is only a brief outline of how search engines work, but if you apply the principles suggested, you should see a significant improvement in your search engine listing position.
Were I to suggest just one individual change, and one specific strategy, to provide the greatest improvement, these would be to add a good Title Tag to the Head section of the HTML of each page on your website, and as a strategy, to employ a good linking strategy to persuade Google that your website is highly regarded online.
It's not just knowing how search engines work that is important, but putting that knowledge to use.