Filtering problems, 2 different visitor values

Avatar
  • updated
  • Answered
I have annoying problem with one report. It gives me 2 different Visitor values with almost the same filters. Logfiles are IIS / Sharepoint 2007 logs.

Filter 1:
Include - Requested file - url/to/my/page.aspx
Gives me 1372 hits and 786 visitors

Filter 2:
Include - Requested file - url/to/*
The I look at the top results and see url/to/my/page.aspx 1372 hits and only 501 visitors.

The first filter is very specific to find only one page. The second filter finds all subpages but giver very different value for the same page.aspx. Why? I can not see any logic in this behavior.

My final goal is to get Visitor values from some specific pages but I also need to get ie top 100 subpages (by visitor) from a site collection.

ps. I also tried Filter 3: Include - Visitors accessed the specified file (url/to/*) but it didnt give me results from that at all. There were only url/a url/b etc...
Avatar
Michael
Probably the issue appears because of the way visitors are determined. The program determines number of visitors by the IP addresses. If a request from an IP address came after some time (timeout) since the last request from this IP, it is considered to belong to a different visitor. The timeout is set to 30 minutes by default but you can change this value in Options > Analysis > General.

It seems that there were visitors that requested url/to/my/page.aspx, then requested some other files that match url/to/* , and then requested url/to/my/page.aspx again, for example:

url/to/my/page.aspx is requested
after 20 minutes url/to/anotherpage.aspx is requested
after 20 minutes url/to/my/page.aspx is requested

In such case if you use the url/to/* filter, all these requests are analyzed and the program determines that there was one visitor that requested page url/to/my/page.aspx (while there were two hits).

However, if you use the url/to/my/page.aspx filter, the program will only see that there was one request for page url/to/my/page.aspx, and after 40 minutes another request for the same page from the same IP. As the timeout is set to 30 minutes, the program will report two visitors (and two hits) in this case.

So if you use the url/to/my/page.aspx filter instead of url/to/* filter, the program may report more visitors and it happens in your case.
Avatar
tsughan
Thanks for the explanation, now it really makes sense :)

What are your recommends on calculating Visitors? Should I use the exact link which leads to a lot of work when I need to report several subpages or should I just take a common page report which shows less visitors?

Can I make several report files (one per filter) on one Analyze round? It takes some time to analyze over 30Gbs of logfiles per one round :)
Avatar
Michael
I recommend you to create a common report for all your pages. In this case it seems to show more accurate information on visitor count.

It's not possible to create multiple reports on one Analyze round but you don't need to create separate reports for each file, one common report should be enough. If you wish to get detailed information on each page, you can also add the pages to the tracked file list in Profile Properties > Tracking.
Avatar
tsughan
Ok thank you for your help!

I am just wondering this 30min rule. I do understand how it works in the example Filter 1, but why it does not calculate 2 visitors in this Filter 2 example:

Filter 2: url/to/*
url/to/my/page.aspx is requested
after 20 minutes url/to/anotherpage.aspx is requested
after 20 minutes url/to/my/page.aspx is requested

The 30min timeout has passed if we look the file page.aspx so should it be 2 visitors like in the first filter? There was 40mins between the requests. So my common sense says that the exact filter 1: url/to/my/page.aspx should be more accurate...

And to be sure what you recommend, is it like this:

Take a report with filter url/to/* and then look at the most popular page Visitor numbers?
Avatar
Michael
The timeout is applied to all requests of visitor, not just to requests for the same file. As the time between the first and the second, and the second and the third requests of visitor was 20 minutes, the program thinks that all these three requests belong to the same visitor (as time between any two requests wasn't more than the timeout), so the there was only one visitor that requested the file.

It seems to be more accurate to use the url/to/* filter and check the most popular pages report in this case. While time between requests for the same file from the same IP may be more than the timeout value, if there were requests for other files from the same IP between these requests as in the sample above, all the hits probably belong to the same visitor.