Duplicate entries

Avatar
  • updated
  • Answered
If we read in the webserver log files every 15minutes via scheduler, then when it next runs and the log file hasnt rolled over will it just process the additions in the log file , ie ignore entries that it has already seen? or reprocess the entire log file causing duplicates to appear?
Avatar
Michael
The program will ignore already processed entries. However, if log files are processed every 15 minutes, number of visitors shown may become larger than number of visitors shown if you reprocess the logs. The reason is that if some hits of a visitor are processed, and remaining hits are processed next time you run the program, the program will process these remaining hits as a new visitor.

So there are three possible solutions for this issue:

1. If number of visitors and visitor-related metrics like entry pages, paths, etc. aren't important for you, you can process the logs every 15 minutes.

2. You can process logs once per day, it is the recommended way to get accurate statistics in most cases.

3. If you need to process logs each 15 minutes and need accurate visitor-related statistics, you can disable the "Cache analysis results" in the profile properties. In this case the program will reprocess all log files every 15 minutes, it won't used cached data from previous analysis. It may be OK if you need reports for the current day only or if your logs are small, so reanalyzing logs takes few minutes.