Spider blocking in .htaccess

Avatar
  • updated
  • Answered
Hello

If i block spiders in .htaccess file like:

SetEnvIfNoCase User-Agent Googlebot bot
Deny from env=bot


WebLog Expert still does count them, right?
spider htaccess
Avatar
Michael
The .htaccess file informs robots whether they are allowed to access site pages. If a robot correctly processes the file and doesn't access site pages, it won't be counted by WebLog Expert. If a robot accesses site pages while it is forbidden in the .htaccess file, it will be counted as the program counts all requests of robot not depending on the contents of the .htaccess file.
Avatar
Ruslan Makarov
i think that's not correct
for forbidden spiders i have in the summary:

Total Hits 18,531
Visitor Hits 14,565
Spider Hits 3,966

Total Bandwidth 232.10 MB
Visitor Bandwidth 229.34 MB
Spider Bandwidth 2.76 MB

Bandwidth statistic shows that spiders' hits are counted but spiders do not access pages
If i allow spiders their traffic matches visitors' one.
Avatar
Michael
The .htaccess file doesn't affect statistics. What affects it is the option to count spiders as visitors. If the option is enabled, spider requests are shown as usual visitor requests and are shown in the Pages statistics. Otherwise they are shown in the spiders reports only (e.g. Browsers > Spidered Pages).
Avatar
Ruslan Makarov
also i've noticed that if i have redirect in .htaccess from one my virtual domain to another:

RewriteRule ^index.php/(.*)$ http://v-gornom.ru/$1 [R=301,L]

then those hits are not counted by WLE
For example, i have a lot of lines like these in my log file, but no "/kanaly/" pages in the report at all. Why is that?

185.26.122.23 - - [22/Jan/2018:02:10:05 +0300] "GET /index.php/kanaly/15-kholodnyj-belok?format=feed&type=rss&rss_fulltext=1 HTTP/1.0" 301 296 "-" "-" www.altai-info.ga
185.26.122.23 - - [22/Jan/2018:02:10:05 +0300] "GET /kanaly/15-kholodnyj-belok?format=feed&type=rss&rss_fulltext=1 HTTP/1.0" 200 119767 "-" "-" v-gornom.ru
Avatar
Ruslan Makarov
Sorry, it does count index.php. Just doesn't show the url part after it ("/kanaly/...")
Actually, that's not correct, because url (page) is the whole string:/index.php/kanaly/15-kholodnyj-belok
It is just Joomla specific method to include "index.php" in the url
Avatar
Michael
It's strange, I've just tried to analyze two log lines you provided and got the following pages shown in the Pages report:

http://www.domain.com/ kanaly/15-kholodnyj-belok/
http://www.domain.com/ index.php/kanaly/15-kholodnyj-belok/

Maybe these two pages aren't among top 50 pages shown by default. You can change this number (up to 10000) in the program options ("Report | Contents" category) or profile properties. Just select an appropriate item and click the "Properties" button.