Thursday, January 06, 2005

Analyzing IIS Logfiles for an MCMS site

Once a website has been successfully setup, almost immediately, someone will ask “How many people have visited our site?”. Usually, you will pass the data recorded in the IIS log files to one of the many web log analyzers in the market. Popular web analysis tools (such as WebTrends) automate the task of calculating the number of visitors, page views and other site statistics commonly requested by site owners.

The trouble with MCMS is that the URL recorded for the same posting is not consistent, especially if you have chosen to use the Hierarchical URL Format. A request for a posting could be captured as:

/MyChannel/MyPage.htm

Or, in it’s raw form:

/NR/exeres/2E6CBEC3-65A3-4FEF-B2DB-96643334BDE1,frameless.htm

Although both URLs point to the same posting, web reporting tools will consider them to be requests for 2 separate pages. As a result, some figures may be under-reported, especially if you have applied a filter which does not include the /NR/ folder or if you are looking to find out how many times a particular posting has been viewed. Reports like the one showing the Top 50 requested pages of the site will be littered with ugly URLs and the person reading it may not be able to make any sense out of it.

The solution is: To process the log file before analyzing it and convert all ugly URLs to friendly hierarchal URLs. There’s currently a sample available online: CmsLogFileReporting included as part of the MSIB+Pack. This an excellent tool that comes with source code so you can tweak it any way you like. Looking at the code comments, this is a newer version of the original CmsConvLog application uploaded to GotDotNet earlier. It also comes with a ready-to-use form interface, but I was lacking a MSIB + license so I couldn’t run that.

Nevertheless, it does provide source code. Based on experiences from automating numerous report generation processes of our MCMS websites, here are a few suggested tweaks to the package that may be done to get the tool to work beautifully for a site:

1. If you are not planning on processing the log files on a daily basis, you will have to get it to process more than one log file at a time. Probably the log files in a directory within a certain date range. To do so, you could program the input filename to accept a pattern instead of a name and write a simple iteration to call the method that does the conversion.

2. More importantly, if you have host header mapping turned on – this tweak is a must. The tool converts the ugly URL to a path. So the converted URL becomes /sitename.com/MyChannel/MyPage instead of /MyChannel/MyPage.htm. While this may be alright for sites that do not implement host header mapping, it becomes a problem for sites that do. If you are using WebTrends to analyze the log file, you will find that, after appending the sitename to the page, it becomes:

http://sitename.com/sitename.com/MyChannel/MyPage (the converted URL)

which is not the quite same as that of the posting, which is:
http://sitename.com/MyChannel/MyPage.htm (the nice URL. Note that sitename.com is only included once in the string)

You will still get the problem of the web reporting tool not being able to identify the two URLs as being that of the same posting.

To get around this, simply trim away sitename.com from the returned Path.

3. It also converts URLs to Resources to Paths. While you probably won't have to change this, just bear this in mind when configuring filters based on resource file names in the web reporting tool, especially when it has spaces.

Once the log files have been converted, they can be processed with any web reporting tool like you would logs of traditional websites.

Here's to accurate reports!

5 Comments:

At 9:11 PM, Blogger Chester said...

Nice tool and useful tweaks. But, what will happen if there are more than one posting with the same name in the same channel? I think the better way is to use Guid than name of the posting.

 
At 9:50 AM, Blogger Mei Ying said...

Good point. The analyzer will probably treat them as one and the same. And you will get a combined report for the postings that share the same name.

So yes, the alternative can be: convert all friendly URLs to GUID-type URLs.

The only trouble is : It's not easy to identify postings based on their GUIDs, so either their Titles must be fetched or some other mapping done before the report is readable.

 
At 4:15 PM, Blogger Chester said...

I agree. But before identifying postings must group them in the correct way! After grouping they can be identified up to the level which is available. I'm doing it like that in MCMS Manager. It also provides more information related with postings.

 
At 10:38 AM, Blogger Mei Ying said...

Well, I suppose if the postings aren't grouped by channels, you will have to logical group them some other way before the report is meaningful.

Btw, according to this article,
http://support.microsoft.com/default.aspx?scid=kb;en-us;815460,
having more than one posting with the same name in the same channel slows down MCMS' ability to resolve URLs. So it's probably a better idea to design the website to have postings that have unique names in the first place - saves a few headaches :-)

 
At 12:16 PM, Blogger Chester said...

Yes, as you stated postings must be able to group according to their channels/state/author or some other way which is necessary for the purpose.

That’s true. But we can't be sure that all the postings are with different names in all the channels for all the CMS sites :)

 

Post a Comment

<< Home