Hooking into page parsing with PageParserFilter

I recently looked into HTML minification in ASP.NET. The first thing I thought of was to use an HttpModule to somehow remove white spaces. Not so great as it would execute at run-time and could impact performance. Then I turned to Google for some answers and found out about a really neat feature made public in ASP.NET 2.0, which makes this kind of thing incredibly easy and seemless. This is one of those .NET gems you likely never heard about – the PageParserFilter class.

As it turns out, PageParserFilter allows you to hook into page parsing at compile time. This class provides the control tree of your .aspx page (including server-side and client-side markup) and allows you to alter it using an instance of ControlBuilder. Not sure about you, but I already jizzed my pants. Well, anyway, this would be the ultimate place to do HTML minification magic I was wanting to do. In fact, it’s already been done here by Omari Omarov and works beautifully (download his sample application to see how this is used).

I spent some time analyzing Omari’s code and had another useful idea, which i decided to turn into a simple proof-of-concept code for this post. I decided to use the same method to set cache breakers on my JavaScript and stylesheet includes. If you’re not familiar with cache breakers, all I’m talking about is the version number that you append to the end of your css/js includes to break cache dependency in browsers (i.e. master.js?v=12345). So I mocked up a quick prototype based on Omari’s code just to show how easy this could be done.

using System;  
using System.Collections.Generic;  
using System.Linq;  
using System.Web;  
using System.Web.UI;  
using System.Collections;  
using System.Reflection;  
using System.Web.UI.HtmlControls;

namespace Page.Parsing.Voodoo  
{
    public class CacheBreakerPageParserFilter : PageParserFilter
    {
        const BindingFlags InstPubNonpub = BindingFlags.Public | BindingFlags.NonPublic | BindingFlags.Instance;

        public override bool AllowCode
        {
            get { return true; }
        }

        public override int NumberOfControlsAllowed
        {
            get { return -1; }
        }

        public override int NumberOfDirectDependenciesAllowed
        {
            get { return -1; }
        }

        public override int TotalNumberOfDependenciesAllowed
        {
            get { return -1; }
        }

        public override bool AllowBaseType(Type baseType)
        {
            return true;
        }

        public override bool AllowControl(Type controlType, ControlBuilder builder)
        {
            return true;
        }

        public override bool AllowServerSideInclude(string includeVirtualPath)
        {
            return true;
        }

        public override bool AllowVirtualReference(string referenceVirtualPath, VirtualReferenceType referenceType)
        {
            return true;
        }

        public override CompilationMode GetCompilationMode(CompilationMode current)
        {
            return base.GetCompilationMode(current);
        }

        public override Type GetNoCompileUserControlType()
        {
            return base.GetNoCompileUserControlType();
        }

        public override bool ProcessCodeConstruct(CodeConstructType codeType, string code)
        {
            return base.ProcessCodeConstruct(codeType, code);
        }

        public override bool ProcessDataBindingAttribute(string controlId, string name, string value)
        {
            return base.ProcessDataBindingAttribute(controlId, name, value);
        }

        public override bool ProcessEventHookup(string controlId, string eventName, string handlerName)
        {
            return base.ProcessEventHookup(controlId, eventName, handlerName);
        }

        protected override void Initialize()
        {
            base.Initialize();
        }

        public override void ParseComplete(ControlBuilder rootBuilder)
        {
            SetCacheBreakerOnNestedBuilder(rootBuilder);

            base.ParseComplete(rootBuilder);
        }

        private static void SetCacheBreakerOnNestedBuilder(ControlBuilder controlBuilder)
        {
            ArrayList nestedBuilders = GetSubBuilders(controlBuilder);

            for (int i = 0; i < nestedBuilders.Count; i++)
            {
                string literal = nestedBuilders[i] as string;

                if (string.IsNullOrEmpty(literal)) continue;

                nestedBuilders[i] = AppendCacheBreaker(literal);
            }

            if (controlBuilder.ControlType == typeof(HtmlLink))
            {
                foreach (SimplePropertyEntry entry in GetSimplePropertyEntries(controlBuilder))
                {
                    entry.Value = AppendCacheBreaker(entry.PersistedValue);
                }
            }
            else
            {
                foreach (object nestedBuilder in nestedBuilders)
                {
                    if (nestedBuilder is ControlBuilder)
                    {
                        SetCacheBreakerOnNestedBuilder((ControlBuilder)nestedBuilder);
                    }
                }

                foreach (TemplatePropertyEntry entry in GetTemplatePropertyEntries(controlBuilder))
                {
                    SetCacheBreakerOnNestedBuilder(entry.Builder);
                }

                foreach (ComplexPropertyEntry entry in GetComplexPropertyEntries(controlBuilder))
                {
                    SetCacheBreakerOnNestedBuilder(entry.Builder);
                }
            }

            ControlBuilder defaultPropertyBuilder = GetDefaultPropertyBuilder(controlBuilder);

            if (defaultPropertyBuilder != null)
            {
                SetCacheBreakerOnNestedBuilder(defaultPropertyBuilder);
            }
        }

        private static string AppendCacheBreaker(string literal)
        {
            if (literal.Contains(".css"))
            {
                literal = literal.Replace(".css", ".css?v=1234567890");
            }

            if (literal.Contains(".js"))
            {
                literal = literal.Replace(".js", ".js?v=1234567890");
            }

            return literal;
        }

        private static ArrayList GetSubBuilders(ControlBuilder controlBuilder)
        {
            if (controlBuilder == null)
                throw new ArgumentNullException("controlBuilder");

            return (ArrayList)
                   controlBuilder
                   .GetType()
                   .GetProperty("SubBuilders", InstPubNonpub)
                   .GetValue(controlBuilder, null);
        }

        private static ControlBuilder GetDefaultPropertyBuilder(ControlBuilder controlBuilder)
        {
            if (controlBuilder == null)
                throw new ArgumentNullException("controlBuilder");

            PropertyInfo pi = null;
            Type type = controlBuilder.GetType();

            while (type != null && null == (pi = type.GetProperty("DefaultPropertyBuilder", InstPubNonpub)))
            {
                type = type.BaseType;
            }

            return (ControlBuilder)pi.GetValue(controlBuilder, null);
        }

        private static ArrayList GetTemplatePropertyEntries(ControlBuilder controlBuilder)
        {
            if (controlBuilder == null)
                throw new ArgumentNullException("controlBuilder");

            ICollection tpes = (ICollection)
                               controlBuilder
                               .GetType()
                               .GetProperty("TemplatePropertyEntries", InstPubNonpub)
                               .GetValue(controlBuilder, null);

            if (tpes == null || tpes.Count == 0)
            {
                return new ArrayList(0);
            }
            else
            {
                return (ArrayList)tpes;
            }
        }

        private static ArrayList GetComplexPropertyEntries(ControlBuilder controlBuilder)
        {
            if (controlBuilder == null)
                throw new ArgumentNullException("controlBuilder");

            ICollection cpes = (ICollection)
                               controlBuilder
                               .GetType()
                               .GetProperty("ComplexPropertyEntries", InstPubNonpub)
                               .GetValue(controlBuilder, null);

            if (cpes == null || cpes.Count == 0)
            {
                return new ArrayList(0);
            }
            else
            {
                return (ArrayList)cpes;
            }
        }

        private static ArrayList GetSimplePropertyEntries(ControlBuilder controlBuilder)
        {
            if (controlBuilder == null)
                throw new ArgumentNullException("controlBuilder");

            ICollection cpes = (ICollection)
                               controlBuilder
                               .GetType()
                               .GetProperty("SimplePropertyEntries", InstPubNonpub)
                               .GetValue(controlBuilder, null);

            if (cpes == null || cpes.Count == 0)
            {
                return new ArrayList(0);
            }
            else
            {
                return (ArrayList)cpes;
            }
        }
    }
}

To use this parser filter reference your parser filter type in configuration settings.

<pages pageParserFilterType="Namespace.MyPageParserFilter, AssemblyName">  

First of all, a lot of code you see here was extracted from Omari’s framework. This is just to show the entire parser filter without any dependencies on his code. So a lot of the voodoo handled in Omari’s framework is thrown in this single class. I would actually suggest using his framework to do any work with parser filters because of the 2 reasons outlined below.

  1. The ControlBuilder class hides a few properties which are of interest to us. So, we have to use reflection magic. Omari’s parser filter framework provides a useful list of extension methods that help with that.

  2. By design, you can only register a single PageParserFilter. Once again, Omari gives us a configuration section to register and execute more than 1 if so necessary.

Aside from overriding properties which you can read about on MSDN, let me get straight to the point. I do all of the parsing in ParseComplete override. This method is called upon when the page parser finished parsing all of the client and server-side markup. At this point, the entire page hierarchy of server-side controls and HTML elements is encapsulated in an instance of ControlBuilder passed to this method. This object is nested as it contains ControlBuilder for children elements within the page hierarchy. (Note, some elements such as DOCTYPE do not require a ControlBuilder and are represented as simple strings. In fact, it appears that all non-nested elements are parsed as strings.).

We need to recursively loop through this object to step through the entire page hierarchy looking for external file references. I do this task in SetCacheBreakerOnNestedBuilder method. If I find an external JavaScript or stylesheet reference, I tack on the hard-coded ?v=1234567890 at the end.

As this is only a proof-of-concept, there is a lot of room for improvement. Version number could be retrieved from the executing assembly or the configuration file. Also, some script references may not need to be versioned (i.e. references from Google’s CDN), so we could add an attribute to certain script elements to exclude them from versioning and have our parser filter read and delete them afterward.

There are probably a lot of more useful things you could do with this class. Phil Haacked also talked about a way to throw a compile-time exception if you want to restrict certain elements (i.e. server script blocks) from your MVC views.