Automatic pagerank checking script

Checking the pagerank of a website is easy. You just search google for the "google pagerank checker" keyword and you instantly get more than 3 million of results. Each website offers you the possibility to manually enter a web site url and returns you the rank. That's great, but what if you need to know the pagerank of more than 200 web sites? And what if you need to know other rankings as well?

I'm preparing an article with a top 100 blogs for software testing. I will not write down my personal choice, instead I will make a huge list of blogs and rank them according to the how to make a top blog list article on Noop.nl. Several rankings are taken into account, like the Google pagerank (PR), Alexa reach, Alexa popularity, Technorati authority and the number of referring links according to Google.

Gathering a list of 300+ blogs to start with is already a good start. But retrieving the rankings of each blog by hand is something I don't intend to do. Therefore I created a script to check the rankings for a list of sites automatically. For each site in the input file, the script fetches the rank at each of the selected ranking-service. Each site and its ranks are written to a file to be opened with MS Excel afterwards.

In order to run the script you need to download AutoIt. You can download the latest version from http://www.autoitscript.com/. Pick "AutoIt Full Installation" which includes ScITE, an excellent AutoIt editor providing syntax highlighting, autocompletion, and more.

The script is completely free to use, though a comment in the comment section of this post would be appreciated :-)

Preparation of the pagerank script

The rank check script consists out of four main files:
A configuration file (.ini)
An input file containing all sites to get the ranking for (.txt)
The script file (.au3)
An output file containing all sites with their rankings (.csv)
Three out of four files need to be created manually by copying and pasting the code below. The fourth file is generated automatically by the script.

The configuration file

Create a file "checkRanking.ini" and add following code to it:


[launch]
selectedRankings=googlePR,googleLink,alexaPopularity,technorati

[common]
#The name of the ranking input file
rankingInputFileName=rankingInput.txt
#The name of the output ranking file
rankingOutputFileName=rankingOutput.csv
#The separator of the output ranking file
rankingOutputSeparator=,
#The separator of the ranking service details
rankingServiceSeparator=§
#The tag in the ranking service url to replace by the site to rank
rankingServiceUrlTag=[url]
#The ranking value for sites not having a rank
rankingValueNoRank=N/A
#The ranking value for sites in the excluded domains
rankingValueExcluded=Excluded

[ranking-service]
#The ranking services:
#parameter 1 = the url of the service to request
#parameter 2 = the regular expression to fetch the ranking
googlePR=http://www.google.com/search?client=navclient-auto&ch=6-1484155081&features=Rank&q=info:[url]§.*:\d:(.*)
alexaPopularity=http://data.alexa.com/data?cli=10&dat=snbamz&url=[url]§POPULARITY\s\S+\sTEXT=\"(.\d+)\"
alexaReach=http://data.alexa.com/data?cli=10&dat=snbamz&url=[url]§REACH\sRANK=\"(\d+)\"
technorati=http://www.technorati.com/blogs/[url]§>Authority:\s([\d,]+)
googleLink=http://www.google.be/search?hl=nl&q=link%3A[url]§([\d\.]+)\smet\slinks\stot

#The domain names which shouldn't get a ranking at a certain ranking-service
[domains-to-exclude]
alexaPopularity=.thoughtworks.com§.msdn.com
alexaReach=.thoughtworks.com§.msdn.com
You can further tweak the configuration and add rating services to work with.

The site input file

Create a file "rankingInput.txt" and add the sites of your choice to it, for instance:

http://www.testingminded.com
http://www.testsquad.org
http://www.problogdesign.com
http://www.problogger.net
http://www.noop.nl
http://blogs.msdn.com/joshpoley/
http://blogs.thoughtworks.com/testblog
http://notexistingsite.com

The script file

Create a file "checkRanking.au3" and add following code to it:

;----------------------- TESTINGMINDED -----------------------
;--- pagerank checking script from www.testingminded.com -----
;-------------------- DECLARATION SECTION --------------------

#include <ARRAY.AU3>
#include <FILE.AU3>
#include <INET.AU3>

Opt("TrayAutoPause", 0)

Global $sitesToRank

_inform("Initializing...")

$scriptName = StringLeft(@ScriptName, StringLen(@ScriptName) - 4)
$iniLocation = @ScriptDir & '\' & $scriptName & '.ini'
If Not (FileExists($iniLocation)) Then
MsgBox(0, "Exception", "The configuration file could not be found: " & $iniLocation)
Exit
EndIf

$rankingInputFileName = IniRead($iniLocation, "common", "rankingInputFileName", "NotFound")
$rankingOutputFileName = IniRead($iniLocation, "common", "rankingOutputFileName", "NotFound")
$rankingOutputSeparator = IniRead($iniLocation, "common", "rankingOutputSeparator", "NotFound")
$rankingServiceSeparator = IniRead($iniLocation, "common", "rankingServiceSeparator", "NotFound")
$rankingServiceUrlTag = IniRead($iniLocation, "common", "rankingServiceUrlTag", "NotFound")
$rankingValueNoRank = IniRead($iniLocation, "common", "rankingValueNoRank", "NotFound")
$rankingValueExcluded = IniRead($iniLocation, "common", "rankingValueExcluded", "NotFound")
$selectedRankings = IniRead($iniLocation, "launch", "selectedRankings", "NotFound")
$selectedRankingSettings = _arraycreate("", "", "", "", "", "", "", "", "", "", "", "", "", "")

; ------------------------------------------------------------------------------
; MAIN
; ------------------------------------------------------------------------------

$rankingInputFile = @ScriptDir & "\" & $rankingInputFileName

If Not (FileExists($rankingInputFile)) Then
MsgBox(0, "Exception", "The ranking input file could not be found: " & $rankingInputFile)
Exit
EndIf

$timeStamp = @YEAR & '-' & @MON & '-' & @MDAY & '_' & @HOUR & '-' & @MIN & '-' & @SEC
$RankingOutputFile = @ScriptDir & "\" & $rankingOutputFileName
_FileReadToArray($rankingInputFile, $sitesToRank)

;retrieve the ranking service details, but only for the selected ranking services
$rankingSettings = _getSelectedRankingSettings($selectedRankings)

;create the ranking output file
_FileCreate($RankingOutputFile)

;create the header of the output file
$rankHeader = ""
_addItemToList($rankHeader, "site")
For $i = 1 To UBound($rankingSettings) - 1
$rankingSetting = $rankingSettings[$i]
$rankingName = $rankingSetting[1]
_addItemToList($rankHeader, $rankingName)
Next

FileWrite($RankingOutputFile, $rankHeader & @CRLF)

;loop through the sites to rank
For $i = 1 To UBound($sitesToRank) - 1
$rankedSite = $sitesToRank[$i]

For $j = 1 To UBound($rankingSettings) - 1
$rank = _getRank($sitesToRank[$i], $rankingSettings[$j])
_addItemToList($rankedSite, $rank)
Next

FileWrite($RankingOutputFile, $rankedSite & @CRLF)
Next

MsgBox(0, "Information", "The ranking of " & UBound($sitesToRank) - 1 & " sites at " & UBound($rankingSettings) - 1 & " ranking services has completed." & @CRLF & "Please check the ranking output file for the results: " & $RankingOutputFile)

; ------------------------------------------------------------------------------
; FUNCTIONS
; ------------------------------------------------------------------------------

Func _addItemToList(ByRef $list, $item)
If $list == "" Then
$list = $item
Else
$list = $list & $rankingOutputSeparator & $item
EndIf
EndFunc ;==>_addItemToList

Func _getSelectedRankingSettings($selectedRankings)

$selectedRankings = StringSplit($selectedRankings, $rankingOutputSeparator)
ReDim $selectedRankingSettings[UBound($selectedRankings)]
$selectedRankingSettings[0] = UBound($selectedRankings)

For $i = 1 To UBound($selectedRankings) - 1
$line = IniRead($iniLocation, "ranking-service", $selectedRankings[$i], "NotFound")

If $line == "NotFound" Then
MsgBox(0, "Exception", "Could find not ranking-service parameter " & $selectedRankings[$i] & " in the configuration file")
Exit
Else
$lineParameters = StringSplit($line, $rankingServiceSeparator)

If ($lineParameters[0] == 2) Then


$domainLine = IniRead($iniLocation, "domains-to-exclude", $selectedRankings[$i], "NotFound")
Dim $domains[1]

If Not ($domainLine == "NotFound") Then
$domains = StringSplit($domainLine, $rankingServiceSeparator)
EndIf

Dim $rankingSetting[5] = [4, $selectedRankings[$i], $lineParameters[1], $lineParameters[2], $domains]

$selectedRankingSettings[$i] = $rankingSetting
Else
MsgBox(0, "Exception", "Ranking-service parameter " & $line & " does not contain the correct amount of subitems: ranking-service url, regular expression")
Exit
EndIf


EndIf

Next

Return $selectedRankingSettings

EndFunc ;==>_getSelectedRankingSettings

Func _getRank($site, $rankingSettings)

$rankingName = $rankingSettings[1]
$rankingUrl = StringReplace($rankingSettings[2], $rankingServiceUrlTag, $site)
$rankingRegex = $rankingSettings[3]
$domainsToExclude = $rankingSettings[4]
$toExclude = False

For $i = 1 To UBound($domainsToExclude) - 1
If StringInStr($rankingUrl, $domainsToExclude[$i]) Then
$toExclude = True
$i = UBound($domainsToExclude)
EndIf
Next

If $toExclude = False Then

If $rankingName = "googleLink" Then
$rankingUrl = StringReplace($rankingSettings[2], $rankingServiceUrlTag, _urlEncode($site))
Else
$rankingUrl = StringReplace($rankingSettings[2], $rankingServiceUrlTag, $site)
EndIf

_inform("Getting " & $rankingName & " rank for " & $site)


;launch the ranking service request and get the source code
$source = _INetGetSource($rankingUrl)
;retrieve the ranking value
$rankingRegResult = StringRegExp($source, $rankingRegex, 1)

If @error Or ($rankingRegResult[0] == "") Then
Return $rankingValueNoRank
Else
Return StringReplace(StringReplace($rankingRegResult[0], ",", ""), ".", "")
EndIf
Else
Return $rankingValueExcluded

EndIf

EndFunc ;==>_getRank

Func _urlEncode($url)
$url = StringReplace($url, "/", "%2F")
$url = StringReplace($url, ":", "%3A")
Return $url
EndFunc ;==>_urlEncode

Func _inform($message, $timeout = 3)
TrayTip("checkRanking progress...", $message, $timeout)
EndFunc ;==>_inform

Proof running the script

At this stage, you should be able to run your script and get results. Save your script, ranking input file and configuration file, and run the script by pressing F5 in your ScITE editor. If you correctly followed all steps, a traytip appears to indicate the ranking has started. If not, then please read the error information carefully.

The ranking output file

After running the ranking script you should get a ranking output file named "rankingOutput.csv" in the same directory as where the script resides. The content should look like this:

site,googlePR,googleLink,alexaPopularity,technorati
http://www.testingminded.com,1,2,2020990,N/A
http://www.testsquad.org,3,13,1595485,5
http://www.problogdesign.com,5,392,44114,304
http://www.problogger.net,6,11000,4202,4130
http://www.noop.nl,4,420,166693,174
http://notexistingsite.com,N/A,N/A,N/A,N/A
http://blogs.msdn.com/joshpoley/,4,108,Excluded,N/A
http://blogs.thoughtworks.com/testblog,N/A,N/A,Excluded,N/A
If you encounter difficulties with the execution of these steps and you don't succeed in fixing the problem yourself, then leave a message in the comment section. Don't forget to include the error message shown in the ScITE output window. I will answer to your question as soon as possible.

Related Posts by Categories

Comments

11 Responses to "Automatic pagerank checking script"

Chang said... September 9, 2009 at 8:41 PM

Hi,
Thanks for this extremely useful resource. But I have a problem. Pageranks are not being calculated in my case. It shows N/A only. Where I went wrong? Please help me.

Steven Machtelinckx said... September 9, 2009 at 9:42 PM

I can't think of a reason right away. Could you please paste here the content of your configuration and site input file? I will have a look then.

axel Analyse pub Marketing said... October 20, 2010 at 10:23 PM

Nice script dude but the result is N/A for every sites, even the ones with known PR.

Steven Machtelinckx said... October 21, 2010 at 7:46 AM

-Are you running this script from behind a proxy?

-Can you upload a zip with your script and configuration files and send me the link?

Anonymous said... March 29, 2011 at 1:38 PM

hey in my scenario,i have to capture the result also and have to save in excel or text file.for ex through putty it will login in to server,df-k command is given,then want to save that results and through autoit want to save the results in excel file.is it possible.

Steven Machtelinckx said... May 24, 2011 at 11:38 AM

Putty supports outputting to a text file and Autoit has user defined functions supporting excel automation. So yes, this should be possible.

Anonymous said... September 20, 2011 at 2:56 PM

i think the reason for that would be the same as using bulk pagerank checkers repeatedly that the site errors out and gives N/A results instead

Bespoke Software said... January 4, 2012 at 4:46 PM

This is a really nice little script but I'm getting a N/A for every site, even the ones that I know have PR.

Any Suggestions?

Steven Machtelinckx said... January 5, 2012 at 12:42 PM

Hello,

I copied the url into the browser and get a 403 error. It appears that google has changed its hashing algorithm. The url I provided in this article is therefore no longer working.

One of the options to switch to Ruby and use following script: https://github.com/pstadler/pagerank-service
Porting above ruby script to AutoIt is also an option :-)

sagar sharma said... May 17, 2012 at 1:35 PM

nice script

TestWithUs said... May 7, 2013 at 5:56 AM

SWIFT Interview questions on

http://testwithus.blogspot.in/p/swift.htm

For selenium solution visit
http://testwithus.blogspot.in/p/blog-page.html


For QTP interview questions

http://testwithus.blogspot.in/p/qtp-questions.html


www.searchyourpolicy.com




Post a Comment

Recent Articles

Top Commenters

Recent Comments