|
Page 3 of 7
Get Inside the PageRank Algorithm
Delve into the inner workings of the Google PageRank algorithm and how it affects results.
PageRank is the algorithm used by the Google search engine, originally
formulated by Sergey Brin and Larry Page in their paper "The Anatomy of
a Large-Scale Hypertextual Web Search Engine."
It is based on
the premise, prevalent in the world of academia, that the importance of
a research paper can be judged by the number of citations the paper has
from other research papers. Brin and Page have simply transferred this
premise to its Web equivalent: the importance of a web page can be
judged by the number of hyperlinks pointing to it from other web pages.
8.8.1. So What Is the Algorithm?
It may look daunting to non-mathematicians, but the PageRank algorithm
is in fact elegantly simple and is calculated as follows:
PR(A) = (1-d) + d { PR(T1) + ... + PR(Tn) }
------ ------
C(T1) C(Tn)
PR(A) is the PageRank of a page A.
PR(T1) is the PageRank of a page T1.
C(T1) is the number of outgoing links from the page T1.
d is a damping factor in the range 0 < d < 1, usually set to 0.85.
The PageRank of a web page is therefore calculated as a sum of the
PageRanks of all pages linking to it (its incoming links), divided by
the number of links on each of those pages (its outgoing links).
8.8.2. And What Does This Mean?
From a search engine marketer's point of view, this means there are two
ways in which PageRank can affect the position of your page on Google:
The number of incoming links. Obviously, the more of these, the better.
But there is another thing the algorithm tells us: no incoming link can
have a negative effect on the PageRank of the page it points at. At
worst, it can simply have no effect at all.
The number of
outgoing links on the page that points to your page. The fewer of
these, the better. This is interesting: it means that, given two pages
of equal PageRank linking to you, one with 5 outgoing links and the
other with 10, you will get twice the increase in PageRank from the
page with only 5 outgoing links.
At this point, we take a step
back and ask ourselves just how important PageRank is to the position
of your page in the Google search results.
The next thing that
we can observe about the PageRank algorithm is that it has nothing
whatsoever to do with relevance to the search terms queried. It is
simply a single (admittedly important) part of the entire Google
relevance ranking algorithm.
Perhaps a good way to look at
PageRank is as a multiplying factor applied to the Google search
results after all other computations have been completed. The Google
algorithm first calculates the relevance of pages in its index to the
search terms, and then multiplies this relevance by the PageRank to
produce a final list. The higher your PageRank, therefore, the higher
up the results you will be, but there are still many other factors
related to the positioning of words on the page that must be considered
first.
8.8.3. So What's the Use of the PageRank Calculator?
If no incoming link has a negative effect, surely I should just get as
many as possible, regardless of the number of outgoing links on its
page?
Well, not entirely. The PageRank algorithm is cleverly
balanced. Just like the conservation of energy in physics with every
reaction, PageRank is also conserved with every calculation. For
instance, if a page with a starting PageRank of 4 has two outgoing
links on it, we know that the amount of PageRank it passes on is
divided equally between all of its outgoing links. In this case, 4 / 2
= 2 units of PageRank is passed on to each of 2 separate pages, and 2 +
2 = 4—so the total PageRank is preserved!
There are
scenarios in which you may find that total PageRank is not conserved
after a calculation. PageRank itself is supposed to represent a
probability distribution, with the individual PageRank of a page
representing the likelihood of a random surfer chancing upon it.
On a much larger scale, supposing Google's index contains a billion
pages, each with a PageRank of 1, the total PageRank across all pages
is equal to a billion. Moreover, each time we recalculate PageRank, no
matter what changes in PageRank may occur between individual pages, the
total PageRank across all one billion pages will still add up to a
billion.
First, this means that, although we may not be able
to change the total PageRank across all pages, by strategic linking of
pages within our site, we can affect the distribution of PageRank
between pages. For instance, we may want most of our visitors to come
into the site through our home page. We would therefore want our home
page to have a higher PageRank relative to other pages within the site.
We should also recall that all the PageRank of a page is passed on and
divided equally between each outgoing link on a page. We would
therefore want to keep as much combined PageRank as possible within our
own site without passing it onto external sites and losing its benefit.
This means we would want any page with lots of external links (i.e.,
links to other people's web sites) to have a lower PageRank relative to
other pages within the site to minimize the amount of PageRank that is
leaked to external sites. Also, bear in mind our earlier statement,
that PageRank is simply a multiplying factor applied once Google's
other calculations regarding relevance have already been calculated. We
would therefore want our more keyword-rich pages to also have a higher
relative PageRank.
Second, if we assume that every new page in
Google's index begins its life with a PageRank of 1, there is a way we
can increase the combined PageRank of pages within our site—by
increasing the number of pages! A site with 10 pages will start life
with a combined PageRank of 10, which is then redistributed through its
hyperlinks. A site with 12 pages will therefore start with a combined
PageRank of 12. We can thus improve the PageRank of our site as a whole
by creating new content (i.e., more pages), and then control the
distribution of that combined PageRank through strategic interlinking
between the pages.
And this is the purpose of the PageRank
Calculator—to create a model of the site on a small scale including the
links between pages, and see what effect the model has on the
distribution of PageRank.
8.8.4. How Does the PageRank Calculator Work?
To get a better idea of the realities of PageRank, visit the PageRank Calculator (http://www.markhorrell.com/seo/pagerank.asp).
It's simple, really. Start by typing in the number of interlinking
pages that you wish to analyze and hit Submit. I have confined this
number to just 20 pages to ease server resources. Even so, this should
give a reasonable indication of how strategic linking can affect the
PageRank distribution.
Next, for ease of reference once the
calculation has been performed, provide a label for each page (e.g.,
Home Page, Links Page, Contact Us Page, etc.), and again hit Submit.
Finally, use the list boxes to select which pages each page links to.
You can use Ctrl and Shift to highlight multiple selections.
You can also use this screen to change the initial PageRanks of each
page. For instance, if one of your pages is supposed to represent
Yahoo!, you may wish to raise its initial PageRank to, say, 3. However,
in actuality, starting PageRank is irrelevant to its final computed
value. In other words, even if one page were to start with a PageRank
of 100, after many iterations of the equation, the final computed
PageRank will converge to the same value as it would had it started
with a PageRank of only 1!
You can play around with the
damping factor d, which defaults to 0.85, as this is the value quoted
in Brin and Page's research paper.
Mark Horrell
|