Analyzing IIS Logfiles for an MCMS site
Once a website has been successfully setup, almost immediately, someone will ask “How many people have visited our site?”. Usually, you will pass the data recorded in the IIS log files to one of the many web log analyzers in the market. Popular web analysis tools (such as WebTrends) automate the task of calculating the number of visitors, page views and other site statistics commonly requested by site owners.
The trouble with MCMS is that the URL recorded for the same posting is not consistent, especially if you have chosen to use the Hierarchical URL Format. A request for a posting could be captured as:
Or, in it’s raw form:
Although both URLs point to the same posting, web reporting tools will consider them to be requests for 2 separate pages. As a result, some figures may be under-reported, especially if you have applied a filter which does not include the /NR/ folder or if you are looking to find out how many times a particular posting has been viewed. Reports like the one showing the Top 50 requested pages of the site will be littered with ugly URLs and the person reading it may not be able to make any sense out of it.
The solution is: To process the log file before analyzing it and convert all ugly URLs to friendly hierarchal URLs. There’s currently a sample available online: CmsLogFileReporting included as part of the MSIB+Pack. This an excellent tool that comes with source code so you can tweak it any way you like. Looking at the code comments, this is a newer version of the original CmsConvLog application uploaded to GotDotNet earlier. It also comes with a ready-to-use form interface, but I was lacking a MSIB + license so I couldn’t run that.
Nevertheless, it does provide source code. Based on experiences from automating numerous report generation processes of our MCMS websites, here are a few suggested tweaks to the package that may be done to get the tool to work beautifully for a site:
1. If you are not planning on processing the log files on a daily basis, you will have to get it to process more than one log file at a time. Probably the log files in a directory within a certain date range. To do so, you could program the input filename to accept a pattern instead of a name and write a simple iteration to call the method that does the conversion.
2. More importantly, if you have host header mapping turned on – this tweak is a must. The tool converts the ugly URL to a path. So the converted URL becomes /sitename.com/MyChannel/MyPage instead of /MyChannel/MyPage.htm. While this may be alright for sites that do not implement host header mapping, it becomes a problem for sites that do. If you are using WebTrends to analyze the log file, you will find that, after appending the sitename to the page, it becomes:
http://sitename.com/sitename.com/MyChannel/MyPage (the converted URL)
which is not the quite same as that of the posting, which is:
http://sitename.com/MyChannel/MyPage.htm (the nice URL. Note that sitename.com is only included once in the string)
You will still get the problem of the web reporting tool not being able to identify the two URLs as being that of the same posting.
To get around this, simply trim away sitename.com from the returned Path.
3. It also converts URLs to Resources to Paths. While you probably won't have to change this, just bear this in mind when configuring filters based on resource file names in the web reporting tool, especially when it has spaces.
Once the log files have been converted, they can be processed with any web reporting tool like you would logs of traditional websites.
Here's to accurate reports!