|
I can't explain what sound a web hit makes, but I have definite information on what is written into the log file of a web server. The web server keeps a log of operations it has been performing. This log contains valuable information about how the web site has been used. What exactly does get logged? After the web server received a request and sent the response it writes a log entry into the log file. A log entry simply is a line of text which typically looks like this (though less colorful):
Let's take a look at the different parts of the log entry (marked in color). Host. Here: 195.145.250.9 (numeric form). This is the IP address of the requester. Or it might be the IP address of a proxy that the request has been routed through. But what is a proxy? A proxy is a web server that acts as an intermediate between a requester and a web server. This especially makes sense when the proxy "caches" requests. If the requested URL allready is in the cache, it gets served right away. If the URL is not in the cache (or outdated), the proxy fetches a copy, caches it, and forwards it to the requester. Proxies are used to save on web traffic and reduce the load on web servers. At the same time, they obscure the address of the original requester: the web server only sees the last address in the chain. User. Here: - (not defined). This is the identity of the requester according to the Identification Protocol. You won't ever see this "in the wild". I know of no web server which logs this information. Corrections und updates welcome! Login. Here: - (not defined). This field will be empty unless the requested URL has protected access. In this case, the field will contain the identity used in the authorization (so to say, the "user name"). Date and time. Here [01/Sep/2001:12:20:14 +0200] This is when the URL has been requested. The field is subdivided in
Be aware that the time stamp format can vary wildly between servers. |
Command. Here: "GET / HTTP/1.0" The command sent to the web server. The field is subdivided in
What a browser sends to the web server pretty much always is a "GET" command. Form data might get sent back with "POST". The HTTP protocol allows uploading of pages with "PUT" (and some other things) as well. Result code Here: 200 (means"OK"). This is the code for the outcome of the operation. "404 - page not found" is a very popular result code you might know off hand. Bytes transfered. Here: 1827 This is the number of bytes sent between the requester and the web server. The direction is determined by the operation. Referrer. Here: "http://www.cnet.com/webLogAnalysers/" This is the URL of the referring page, which is the page which contained the link that the user has clicked upon. A very interesting information indeed! Agent. Here: "Mozilla/4.0 (Windows 98)" (refers to Netscape 4.0 and Windows 98 as operating system). This is the name (and possibly version) of the agent = program that made the request. This example could be a browser, a crawler, or a download tool. This field often contains all kind of other information, such as the operating system the agent runs on. © 2003 Christian Treber, www.ctreber.com Back to main page. |