Latent Semantic Indexing
Latent semantic indexing, commonly referred to as LSI, is claimed to be used by Google as a means of determining the meaning of the content of web pages. There has been a lot written about LSI and what it means, although only Google* knows what Google is referring to when they use the term, and statisticians probably have a different version.
Latent Semantic Analysis is a mathematical/statistical means of assessing the use and meaning of language. Words and word strings can be used to determine the context in which they are used. LSI can be regarded as a misnomer, in that the term Latent Semantic 'Indexing' is basically meaningless. Indexing can be carried out through the use of Latent Semantic 'Analysis' as a tool to enable a web page to be indexed for specific keywords, but nothing can be indexed 'latent semantically'.
Being Pedantic is Not Always Bad
However, as usual, I am likely being pedantic, so will use the common term LSI rather than the correct one of LSA. Pedantism in language and the way it is used, however, is probably the only way that language will continue to be used in a way that is not ambiguous and left to interpretation, and that is precisely the reason for Google adopting this algorithm to improve its service to its users.
Don't forget that a search engine's customers are not you and I that try to make money from it, but those that use it to find information. The more accurately that information meets the needs of the user, then the more likely that user is to use that search engine next time.
Latent semantic indexing does not use any human dictionary, and its input is character strings that it defines into words, sentences and paragraphs. This is a very simplistic description of a complex statistical and mathematical computation, but the upshot is that it cannot be fooled by the excessive use of keywords as sole content of a web page.
Latent Semantic Indexing and Meaning
To put it even more simply, LSI allows searchers to find information that contains no words used in their queries. Hence, if you are looking for information on calligraphy, it will be provided if you use words such as writing, penmanship and script in your search term. Before the adoption of LSI, such searches would provide pages with information on “calligraphy” only if the word was also contained in the search term or ‘keyword’. Now, meaning is more important than keywords.
It has been adopted by search engines, specifically Google, in order to dilute the requirement for the presence of specific search terms within a web page or article in favor of content that the search term used implies would meet the needs of the person carrying out the search. In other words, LSI is intended to provide a better service to Google’s customers. It renders irrelevant any content which contains little but specific keywords, and punishes word repetition to the extent that even 2% keyword density could, in some cases, be regarded as keyword ‘stuffing’
Keyword Density has Less Importance
For this reason, the use of ‘keywords’ as such are now of less significance than prior to the adoption of the LSI concept by search engines. It is no longer necessary for content to contain between 1% and 3% ‘keyword density’ as was, and still is, recommended. In fact, no keywords are necessary at all, though it is still useful to set the theme of your page with a ‘keyword’ in the title, introduce one at the beginning of the text and finish with one near the end. The remainder of the text should be rich in content which is relevant to the theme of the page, or what was once called the ‘keyword’, such as ‘writing’, penmanship’ and ‘script’ are relevant to the topic of ‘calligraphy’.
If the search engines take this to a logical conclusion, and there is no reason why they should not, many businesses that rely on keyword research and suchlike will have to adapt to survive. Wordtracker will lose popularity and the old thesaurus will once more become king.
Software which uses synonym replacement will have to become more sophisticated, and ensure not only that synonyms are not repeated but also that they are true synonyms with grammatical and semantic relevance to the context in which they are used. This has been sadly lacking in all of such software that I have purchased for my review sites.
Let's have a look at two possible practical examples of LSI at work, and these will likely make the whole concept a good deal clearer to you. Take the partial phrase "When a spider crawls the web, it is looking for. . . "
What does this mean to you? Does it mean that an arachnid is seeking flies? Or does it means that a search engine is crawling the World Wide Web looking for pages that meet certain criteria? You don't know the answer to that until you read the rest of the text. That is LSI at work in your brain. Now, a computer might be able to compute, but as yet it has no brain and can base its results only according to predefined rules.
These rules make up the algorithm programed to the rules of LSA: once it sees the word 'fly', it associates the spider as being an arachnid. If sees the word 'page' or 'site', it will associate it with a website. The rest of the text on the page will enable the algorithm to correctly index that page and somebody seeking for information on how spiders detect flies on their webs won't end up with page after page about internet marketing!
Another example: A web page is titled "A History of Locks". What does that mean? It likely doesn't mean the history of locks of hair since that is senseless, but how does a piece of software know that? It could mean that to an algorithm that doesn't think like humans. In fact a search using the term offers one reference to the history of 'dreadlocks'.
A page referring to locks could equally be a history of canal locks or security locks, each of these being highly plausible. So how does the spider, or algorithm, determine the subject so that the user of the search engine is not given useless information?
LSA! It is programmed to know that the character strings 'barge', 'longboat', 'canal' and so on will relate to canal locks, and that 'keys', 'keyhole', 'security', etc. will relate to security locks. You have no need to use the word 'lock' over and over again as a keyword: the semantics of the rest of the vocabulary will provide the algorithm with the answer. The word 'semantics' refers to the meaning of words, in this case the meaning of the word 'lock'.
Adsense and the LSI Concept
In fact, it is the concept of latent semantic indexing that Google use in their Adsense program to determine what type of Adwords adverts should be placed on the web pages of members of the program.
I can't use Adsense because my account has been disabled for inadvertently using cheap PPC ads to send visitors to my well-written Adsense pages! Very unfair, because I was young and raw then - now I am old and decrepit, and have learned how not to do such things, so can now offer my clients advice on Adsense - Established Customers only though! As they say, burglars are the best guys to give you advice on home security! Back to business:
Google,and other major search engines are using the LSI concept to determine what website content is really about: what it is really saying. It is catching out pages written specifically to get listed for individual keywords, but that have little useful content other than meaningless repetitions of the keyword.
Google 'Slap' Relates to Poor Web Pages - Not Domains or Websites
Many web pages which, until recently, have been highly listed by Google and other search engines, have disappeared overnight after being subject to scrutiny such as they have never had before and have been found wanting.
If the content of your web pages is relevant to the topic or theme of the website, and if you can honestly say that you would find them interesting were you searching for the information they claim to provide, then your pages should be safe. Bear in mind that the search engines treat every single page separately and that entire websites are not delisted, only individual pages.
The recent Panda and Penguin algorithms have added a new dimension to web content and the way it is advertised or promoted. These two Google updates must now be considered in addition to LSI in regaining any lost ranking. Check my web pages on both Google Panda and Penguin for more details. These links open in new pages to enable you stay here after reading them.
Also, bear in mind that if you are linked to any webpage that is considered substandard by Google your own page might suffer. If that webpage is then dropped from the listings and subsequently deleted by the webmaster your link will become a broken link which search engines detest. Therefore, I advise you to make a regular check of your links to make sure they are all live. There is software available to help you do this if you have too many to check manually.
I use the totally FREE Xenu Link Sleuth: try it on each of your domains, because broken links can have a significantly negative impact on your ranking.
The link opens in a new window that is easy to leave because I don't want to lose you just yet! There is more good stuff to come if this has interested you so far.
LSI is a Concept - Not a Method or Technique
It is important to understand that LSI is not a technique, as such. It is a concept born of complex statistical analysis and the idea that latent semantic indexing can be used to improve a webpage is blatant nonsense. LSI cannot be used as such, and SEO sites that claim to able to write LSI friendly websites are doing this through ignorance.
However, if you write with the concept in mind by restricting your use of the keyword to a maximum o.8-1.5% and use good solid synonyms or phrases to emphasize the search terms for which you want to be listed, the LSI algorithm will be more powerful than endless repetition of keywords. That is yesterdays tactic that is now punished severely!
Will LSI Make Keywords History?
The term ‘keyword’ or ‘key phrase’ might become history while search engines use a 'semantic' concept for assessing webpage content. What will remain true is that Google will continue to work to satisfy Google's customers and ensure as far it can that websites generated purely for profit, and not to provide information, will not see the light of day on its result pages, and that the current form of content analysis will be refined to that end.
The day will come when you will type 'locks' into your search engine and the result will provide exactly the type of 'locks' that you are thinking of. Or perhaps not: in the future it may likely only be required that you think of it!
Until then, latent semantic indexing, or LSI, will likely continue to be a concept that is used to judge the worth of our writing, and the listing position of our web pages and articles.
* In the USA a business name is used in the singular, such as 'Google is', while in the UK and Europe is it used in the plural, such as ''Google are'. Because over 75% of my visitors are American I bow to them and use their grammar and spelling - that doesn't mean it is correct of course! Just that there are more Americans, as General Cornwallis discovered at Yorktown in 1781!
P.S.: I am not English - I am Scots
P.P.S.: If you want to learn more about the best way to write articles and how use them to promote your business and make more money, check out my eBook Article Marketing
BACK to SEARCH ENGINES