Breaking down Spider searches

Avatar
  • updated
  • Answered
New user to weblog and so far I'm loving all the information it gives, great product.

Couple of quick questions though. I'm basically doing a a group of logs, and have a filter thats setup to just grab the spider googlebot.

1. When i see the results i'm getting http://www.mydomain.com/http:/ /www.mydomain.com/pictures/that-swimsu...

is it suppose to list the domain in that format? almost looks like its adding information, so instead of just saying http://www.mydomain.com/pictures//tha... it adds http://www.mydomain.com to the start of it? ?

2. Is it possible to do a report that basically takes all the googlebot information and then also just takes the url I want and dispaly them in a report.

Example:

Lets say I have 20 URL that all say something like
www.mydomain.com/pictures/a
www.mydomain.com/pictures/b
www.mydomain.com/pictures/c
www.mydomain.com/pictures/d
and so

Instead of adding them up manually can i just tell the weblog to run a report that says okay googlebot basically hit www.mydomain.com/pictures/ directory and adds up all the hits instead of showing me every single different URL.

Hope that makes sense.
Avatar
Joshua
Should add just a little more information in regards to question 2.

The end result would be I would want to put in 10 filters all saying something like

mydomain.com/pictures/*
mydomain.com/videos/*
mydomain.com/ads/*

As the end result I would like the report to say that googlebot hit pictures 800 times, hit videos 388 times and ads 9 times.
Avatar
Michael
1. It seems that your log files include domain name along with file name instead of just file name like in usual logs. You can find more information on how to set a correct log format here.

2. If you use the Professional or Enterprise edition, you can do it by creating a custom table in Options > Report > Tables. When you create the table, choose "Directory" as main data and create an include "Spider" filter with value * in the table properties. So the table will include statistics on all spider requests for files inside reported directories.
Avatar
Joshua
Thanks for the reply and it helped out a bunch.

Next Question - in the report its still showing the complete URL, is there anyway of getting rid of that, image attached shows a better example.

Basically what it comes down to is that I've got a list of URLS about 25 of them that the SEO team wants to see just how many times Googlebot or any other spider hits those URLS, but they don't want to see all the URLS they just want a total of how many times the root directory of that specific URL was called.
Avatar
Joshua
In the report its still showing the complete URL, is there anyway of getting rid of that, image attached shows a better example.

Basically what it comes down to is that I've got a list of URLS about 25 of them that the SEO team wants to see just how many times Googlebot or any other spider hits those URLS, but they don't want to see all the URLS they just want a total of how many times the root directory of that specific URL was called.

Here is the Profile I have so far,

[Profile]
Name=Break New
[General]
IndexFile=index.htm
Domain=www.break.com
DNSLookup=0
bRetrievePageTitles=0
bUseAnalysisCache=1
PaidSearchAndGoals=0
CustomAnalysisSettings=0
AnalysisSettings-iShowFileQueryParameters=0
AnalysisSettings-stFileQueryParameters=
AnalysisSettings-iFileNamesCase=0
AnalysisSettings-bConvertFileQueriesToLowerCase=0
AnalysisSettings-fTimeOffset=0.000
AnalysisSettings-iTimeZone=0
AnalysisSettings-iDateFormat=0
AnalysisSettings-iFirstDayOfWeek=0
AnalysisSettings-stDateFormat=mm/dd/yyyy
AnalysisSettings-iHostsReport=7945656
AnalysisSettings-iCountSpidersAsVisitors=0
[Logs]
Source=0
FilePath=H:\varnish\var*.log
UseTrackingCode=0
[LandingPages]
Count=0
[ConversionGoals]
Count=0
[TimeRange]
Type=7
PeriodCount=2
StartTime=800316728
EndTime=557748896
bMultipleTimeRangeHTMLReports=0
[Tracking]
Count=0
[Filters]
Count=12
FilterType0=0
FilterCriteria0=1
CriteriaCount0=2
Criteria0-FilterType0=0
Criteria0-FilterCriteria0=0
Criteria0-Filter0=1
Criteria0-Mask0=*/break.com/*
Criteria0-CaseSensitive0=0
Criteria0-IncludeQuery0=0
Criteria0-IncludeVersion0=0
Criteria0-SearchResultType0=0
Criteria0-Enabled0=1
Criteria1-FilterType0=0
Criteria1-FilterCriteria0=0
Criteria1-Filter0=15
Criteria1-Mask0=Googlebot
Criteria1-CaseSensitive0=0
Criteria1-IncludeQuery0=0
Criteria1-IncludeVersion0=0
Criteria1-SearchResultType0=0
Criteria1-Enabled0=1
FilterName0=
Enabled0=1
FilterType1=0
FilterCriteria1=1
CriteriaCount1=2
Criteria0-FilterType1=0
Criteria0-FilterCriteria1=0
Criteria0-Filter1=1
Criteria0-Mask1=*/content/find/*
Criteria0-CaseSensitive1=0
Criteria0-IncludeQuery1=0
Criteria0-IncludeVersion1=0
Criteria0-SearchResultType1=0
Criteria0-Enabled1=1
Criteria1-FilterType1=0
Criteria1-FilterCriteria1=0
Criteria1-Filter1=15
Criteria1-Mask1=Googlebot
Criteria1-CaseSensitive1=0
Criteria1-IncludeQuery1=0
Criteria1-IncludeVersion1=0
Criteria1-SearchResultType1=0
Criteria1-Enabled1=1
FilterName1=
Enabled1=1
FilterType2=0
FilterCriteria2=1
CriteriaCount2=2
Criteria0-FilterType2=0
Criteria0-FilterCriteria2=0
Criteria0-Filter2=1
Criteria0-Mask2=*/findpix/*
Criteria0-CaseSensitive2=0
Criteria0-IncludeQuery2=0
Criteria0-IncludeVersion2=0
Criteria0-SearchResultType2=0
Criteria0-Enabled2=1
Criteria1-FilterType2=0
Criteria1-FilterCriteria2=0
Criteria1-Filter2=15
Criteria1-Mask2=Googlebot
Criteria1-CaseSensitive2=0
Criteria1-IncludeQuery2=0
Criteria1-IncludeVersion2=0
Criteria1-SearchResultType2=0
Criteria1-Enabled2=1
FilterName2=
Enabled2=1
FilterType3=0
FilterCriteria3=1
CriteriaCount3=2
Criteria0-FilterType3=0
Criteria0-FilterCriteria3=0
Criteria0-Filter3=1
Criteria0-Mask3=*/games/*
Criteria0-CaseSensitive3=0
Criteria0-IncludeQuery3=0
Criteria0-IncludeVersion3=0
Criteria0-SearchResultType3=0
Criteria0-Enabled3=1
Criteria1-FilterType3=0
Criteria1-FilterCriteria3=0
Criteria1-Filter3=15
Criteria1-Mask3=Googlebot
Criteria1-CaseSensitive3=0
Criteria1-IncludeQuery3=0
Criteria1-IncludeVersion3=0
Criteria1-SearchResultType3=0
Criteria1-Enabled3=1
FilterName3=
Enabled3=1
FilterType4=0
FilterCriteria4=1
CriteriaCount4=2
Criteria0-FilterType4=0
Criteria0-FilterCriteria4=0
Criteria0-Filter4=1
Criteria0-Mask4=*/game-trailers/*
Criteria0-CaseSensitive4=0
Criteria0-IncludeQuery4=0
Criteria0-IncludeVersion4=0
Criteria0-SearchResultType4=0
Criteria0-Enabled4=1
Criteria1-FilterType4=0
Criteria1-FilterCriteria4=0
Criteria1-Filter4=15
Criteria1-Mask4=Googlebot
Criteria1-CaseSensitive4=0
Criteria1-IncludeQuery4=0
Criteria1-IncludeVersion4=0
Criteria1-SearchResultType4=0
Criteria1-Enabled4=1
FilterName4=
Enabled4=1
FilterType5=0
FilterCriteria5=1
CriteriaCount5=2
Criteria0-FilterType5=0
Criteria0-FilterCriteria5=0
Criteria0-Filter5=1
Criteria0-Mask5=*/horror.break.com/*
Criteria0-CaseSensitive5=0
Criteria0-IncludeQuery5=0
Criteria0-IncludeVersion5=0
Criteria0-SearchResultType5=0
Criteria0-Enabled5=1
Criteria1-FilterType5=0
Criteria1-FilterCriteria5=0
Criteria1-Filter5=15
Criteria1-Mask5=Googlebot
Criteria1-CaseSensitive5=0
Criteria1-IncludeQuery5=0
Criteria1-IncludeVersion5=0
Criteria1-SearchResultType5=0
Criteria1-Enabled5=1
FilterName5=
Enabled5=1
FilterType6=0
FilterCriteria6=1
CriteriaCount6=2
Criteria0-FilterType6=0
Criteria0-FilterCriteria6=0
Criteria0-Filter6=1
Criteria0-Mask6=*/movie-trailers/*
Criteria0-CaseSensitive6=0
Criteria0-IncludeQuery6=0
Criteria0-IncludeVersion6=0
Criteria0-SearchResultType6=0
Criteria0-Enabled6=1
Criteria1-FilterType6=0
Criteria1-FilterCriteria6=0
Criteria1-Filter6=15
Criteria1-Mask6=Googlebot
Criteria1-CaseSensitive6=0
Criteria1-IncludeQuery6=0
Criteria1-IncludeVersion6=0
Criteria1-SearchResultType6=0
Criteria1-Enabled6=1
FilterName6=
Enabled6=1
FilterType7=0
FilterCriteria7=1
CriteriaCount7=2
Criteria0-FilterType7=0
Criteria0-FilterCriteria7=0
Criteria0-Filter7=1
Criteria0-Mask7=*/pictures/*
Criteria0-CaseSensitive7=0
Criteria0-IncludeQuery7=0
Criteria0-IncludeVersion7=0
Criteria0-SearchResultType7=0
Criteria0-Enabled7=1
Criteria1-FilterType7=0
Criteria1-FilterCriteria7=0
Criteria1-Filter7=15
Criteria1-Mask7=Googlebot
Criteria1-CaseSensitive7=0
Criteria1-IncludeQuery7=0
Criteria1-IncludeVersion7=0
Criteria1-SearchResultType7=0
Criteria1-Enabled7=1
FilterName7=
Enabled7=1
FilterType8=0
FilterCriteria8=1
CriteriaCount8=2
Criteria0-FilterType8=0
Criteria0-FilterCriteria8=0
Criteria0-Filter8=1
Criteria0-Mask8=*/pranks/*
Criteria0-CaseSensitive8=0
Criteria0-IncludeQuery8=0
Criteria0-IncludeVersion8=0
Criteria0-SearchResultType8=0
Criteria0-Enabled8=1
Criteria1-FilterType8=0
Criteria1-FilterCriteria8=0
Criteria1-Filter8=15
Criteria1-Mask8=Googlebot
Criteria1-CaseSensitive8=0
Criteria1-IncludeQuery8=0
Criteria1-IncludeVersion8=0
Criteria1-SearchResultType8=0
Criteria1-Enabled8=1
FilterName8=
Enabled8=1
FilterType9=0
FilterCriteria9=1
CriteriaCount9=2
Criteria0-FilterType9=0
Criteria0-FilterCriteria9=0
Criteria0-Filter9=1
Criteria0-Mask9=*/sports.break.com/*
Criteria0-CaseSensitive9=0
Criteria0-IncludeQuery9=0
Criteria0-IncludeVersion9=0
Criteria0-SearchResultType9=0
Criteria0-Enabled9=1
Criteria1-FilterType9=0
Criteria1-FilterCriteria9=0
Criteria1-Filter9=15
Criteria1-Mask9=Googlebot
Criteria1-CaseSensitive9=0
Criteria1-IncludeQuery9=0
Criteria1-IncludeVersion9=0
Criteria1-SearchResultType9=0
Criteria1-Enabled9=1
FilterName9=
Enabled9=1
FilterType10=0
FilterCriteria10=1
CriteriaCount10=2
Criteria0-FilterType10=0
Criteria0-FilterCriteria10=0
Criteria0-Filter10=1
Criteria0-Mask10=*/user/*
Criteria0-CaseSensitive10=0
Criteria0-IncludeQuery10=0
Criteria0-IncludeVersion10=0
Criteria0-SearchResultType10=0
Criteria0-Enabled10=1
Criteria1-FilterType10=0
Criteria1-FilterCriteria10=0
Criteria1-Filter10=15
Criteria1-Mask10=Googlebot
Criteria1-CaseSensitive10=0
Criteria1-IncludeQuery10=0
Criteria1-IncludeVersion10=0
Criteria1-SearchResultType10=0
Criteria1-Enabled10=1
FilterName10=
Enabled10=1
FilterType11=0
FilterCriteria11=1
CriteriaCount11=3
Criteria0-FilterType11=0
Criteria0-FilterCriteria11=0
Criteria0-Filter11=1
Criteria0-Mask11=*/pictures/*
Criteria0-CaseSensitive11=0
Criteria0-IncludeQuery11=0
Criteria0-IncludeVersion11=0
Criteria0-SearchResultType11=0
Criteria0-Enabled11=0
Criteria1-FilterType11=0
Criteria1-FilterCriteria11=0
Criteria1-Filter11=1
Criteria1-Mask11=*/videos/newest/just-submitted/*
Criteria1-CaseSensitive11=0
Criteria1-IncludeQuery11=0
Criteria1-IncludeVersion11=0
Criteria1-SearchResultType11=0
Criteria1-Enabled11=1
Criteria2-FilterType11=0
Criteria2-FilterCriteria11=0
Criteria2-Filter11=15
Criteria2-Mask11=Googlebot
Criteria2-CaseSensitive11=0
Criteria2-IncludeQuery11=0
Criteria2-IncludeVersion11=0
Criteria2-SearchResultType11=0
Criteria2-Enabled11=1
FilterName11=
Enabled11=1
[Report]
Dest=0
DirPath=C:\ProgramData\WebLog Expert\Report\Report.csv
DeleteOldReport=1
CustomCommonReportFormat=0
ReportFormat=2
CustomReportContents=0
ReplaceDailyChart=0
ShowReport=1

So what i'm trying to do with this profile is take all Googlebot data from a set of logs, from the google bot data I want it to only show the root directories of the calls, I created the custom tab but it still shows URL paths I need to get rid of those. I need it to show not only the calls made to the directory but also show all the calls to everything within that directory so for example,

lets say that www.break.com/pictures/123 gets called 800 times
www.break.com/pictures/ gets called 700 times and www.breakc.om/pictures/zyx gets called another 800 times, well I need the report to show www.break.com/pictures/ was called 2300 times since googlebot did in fact hit everything in that directory. Is this possible?

Of course also I have like I said about 20 or so directories that I would like to put into one report.



This reply was created from a merged topic originally titled
Strip out results.
Avatar
Michael
You can show statistics on spider requests for top-level directories (with statistics on subdirectories merged) by creating a custom table in Options > Report > Tables (similar to the custom table I described before).

When you create the table, choose "Directory" as main data. Click the "Custom..." button to the right of it and enter the following value:

http://www.break.com/(.*?)/ ~= \1/

This value should remove the extra "http://www.break.com/" part in this report and will also report agreggated statistics on top-level directories only.

You also need and create an include "Spider" filter with value * in the table properties. So the table will include statistics on all spider requests for files inside reported directories.

If the rule doesn't work, could you send a sample log file from your site to us at support@weblogexpert.com ? You can also find information on how to create the rules at http://www.weblogexpert.com/help/wlex...