I have a long history of working on search engine projects and feel nothing reveals tricks like building a search engine to see how it works. One project was the Initiative to evaluate XML Retrieval (which I was largely working with Universities on advance ontological and syntax mechanisms) while another one was the GoXML Contextual XML Search Engine launched in 1998, days after the recommendation was done. GoXML is still featured on PatentStorm.
SEO has little to do with indexing the content of a page nowadays. All that does is give you a starting reference point. The indexbot parses your page noting some particulars and provides you a weight with various terms.
The key metrics are:
1. Domain name matches search string (note – since hyphens and periods are removed during the webbots normalization process, things like www.ford.com are equal to www.f-or.d.com). Not many people know this since they do not write code to parse domain names. The hyphens are removed since not many people search on hyphens and the search engine index needs to be as efficient and lean as possible.
2. How many relevant sites that point at the site are very important. I showed some Adobe colleagues how to use this to our advantage to beat out Microsoft and Sun for the term “Enterprise Developer Resources”. All I did was ask that everyone make a signature to their email that said “Adobe Enterprise Developer resources – http://www.adobe.com/devnet/lifecycle” and then go about our normal business of posting to public threads. The index assumed that we must be relevant given the other top sites seemed to have links pointing at Adobes site. In reality, these were only archived email threads with the signature being treated as a link. All the Search Engine saw was “
3. How people click on the top ten search results in Google. Google uses an adaptive algorithm which is a variation of the GoXML algorithm of which I co-wrote. We had 51 unique patent points in 1998 on this. When you click on one of the top ten results, Google simply tracks the result via a pass through. You can see this in action by doing any search on google, then right clicking the link and copying it. Where you see www.adobe.com/devnet/livecycle/ or get that URL if you copy/cut, when you right click and copy link, that actually translates to http://www.google.com/url?sa=t&ct=res&amp;amp;amp;cd=1&url=http%3A%2F%2Fwww.adobe.com%2Fdevnet%2Flivecycle%2F&ei=JzZkRvzMEZqUgwPkmNmKBw&usg=AFQjCNGx7iKEUn38Kcfk8woBnWtcNueL9g&sig2=ope-x2wZBZhBXtNlk_fj0w
A case study is “Architectural patterns metamodel”. Matt Mackenzie and I wrote a variation of the gang of fours template for architectural patterns using UML 2.0 and linking known uses. It is now ranked #1 since it is the most used template by many software architects. See http://www.google.com/search?hl=en&q=architectural+Patterns+meta+model&btnG=Search
Note that this is referenced by the unique IP address bound form the incoming HTTPRequest header so you cannot spoof it without additional tricks. Since you have to first receive the call back code from google to build the new outgoing request in order for it to register, there is almost no way to spoof it ;-)
4. The meta tags are indexed and useful but only up to a certain point. Many people who have no clue how code works try in vain to do things like META Content=”mountain, bike, mountain bike, mountain bike clothing . Etc. The truth is that the meta keywords are parsed and normalized stripping out both the commas and spaces except for one space or other delimiter to separate the array. All the indexbot see from the above example is “mountain:bike:mountain:bike:mountain:bike:clothing” Any word repeated is generally disallowed completely and interpreted as spamdexing the bot.
5. Any keywords that do not appear in the body in plain text at least once are heavily discounted unless the core content of the page has no visible words, then the indexbot defaults to what it has to work with to establish the baseline weights.
6. Any keywords that appear more than approx. 7% of the total word count for the body are discounted as spam. (Note - this cannot be verified lately but it used to be true in the early part of the decade).
8. Google overlays the search matrix with an ontology classified by a first order of logic that separates all results into a modal array. The ontological nodes are also ranked at the meta level based on the preceding and mix into the pages dynamically but within the constraints defined by their librarians. That is why a term like “washington” will have results for the president, the state, the university, the actor etc all in the top ten. One way to trick this is to find the least common context then build a site to get #1. Once you have done that, replace the words for the context of your choice and you will usually stay in the top ten since the visibility draws
I have other tricks but have never failed to get any site less than 3rd for the terms including “mountain bike”, aromatherapy, whistler rentals, enterprise developer resources and many others...
Oh yeah - these are the tricks I am willing to share. I am still keeping some others as closely guarded secrets. It's not hard to figure out since it is all based on simple logic. Enjoy and good optimizing. Post your success back here if you find this helped.