|
Recently I wondered what the visitors on my website are actually doing. What I wanted was some analysis of their behaviour when they're browsing through the pages. I looked at a couple of log file analysers and found they either didn't provide what I needed or their prices are pretty steep. So I pulled Perl out of the tool box and wrote AC.log which can be downloaded here for free - "free" because I want to give back to the online community.
A line from a log file looks like this:
From this information it's fairly easy do derive the standard reports such as most downloaded files or the distribution of downloads over the day. More interessting though is the analysis of browsing patterns. For this we have to look at sessions. On which page did a user session begin, which path did the user follow through the web site, how long did s/he look at a particular page, and on which page did the session end. Problem: Since HTTP is a protocol that establishes (and terminates) a connection for each retrieved item, be it a HTML page, a GIF icon, or a JPEG image, there is no such thing as a session - at least not in HTTP. This implies there is no direct way to tell which pages were downloaded in one session. Solution: AC.log assumes if no more than a certain time has passed between two accesses from the same host, these actions are related to each other and signify they have been made by one person in one session. |