Home - Publications - Articles - The Zen of Serving Web Pages - How to Recognize a Human Being

After the Storm - Internet Technologies

Home
Publications
Articles
The Zen of Serving Web Pages
The Connection between Browser and Server
What Sound Does a Web Hit Make?
Nine Aspects of a Web Log Entry
The Philosophy of Hits, Views, and Visits
How to Recognize a Human Being
Words and Their Meaning
Meta Tags
Of Robots and Search Engines
Tips for Tags - The Basics
Tips for Tags - Advanced

How to Recognize a Human Being

Part of the series "The Zen of Serving Web Pages"

By Christian Treber, Internet Applications Specialist

How can you tell apart man and machine when looking at web logs? All these hits - was it somebody browsing your website, or was it a crawler collecting information for a search engine? An automatized tool scavenging email addresses for the next spam attack?

I used to think of a web log as a record of what people have downloaded how often from my server. Not quite so easy!

First off, not every request is a download. Maybe just the header has been requested (caching proxies do that to check if something has changed), or form data has been posted, or maybe someone used the web server as a proxy. This all depends on the operation, and, in case of proxy traffic, on the URL (if it starts with "http://", it's a proxy request).

Even if the request was a download of an URL (a GET operation) it might not have been successfull. Maybe the URL did not exist, or the user wasn't properly authorized, or the server had a bad day. The result code tells us how things went.

And after all that, the URL might not have been requested by a person (with a browser), but by a machine. Search services use crawlers to automatically download whole web sites and index them. Link checkers might probe for the correctness of external links to your site in other web pages. Spammers might try to extract email addresses from your pages.

If we want to answer the question; "what have people been looking at", we need to filter for requests that are GET operations of a local URL that were successful and submitted by a browser

This is what "user filtered" reports are about. We are of course interested in requests that used other operations, employed the web server as a proxy, failed, or were initiated by a machine. But they are the subject of other, surely interesting reports!


© 1998-2005 Christian Treber, ct@ctreber.com. All rights reserved. The author takes no responsability for linked external pages, the content of which by no means reflect his own opinion, convictions etc.