Validating web content in CI

5/18/2010 code tools

If I had another 257 hours in the day, I’d love to build the ultimate web content validator into the continuous integration process I now have. After a successful build, I’d start by kicking off a WebDev.WebServer instance of the site, then fire the SEO toolkit by wrapping it into a .net library. Then extend it with custom tasks run on each page download: like validating the HTML and CSS via W3C, validating the JavaScript via JSLint, and for html content, I’d regex out script and style tag content, padding the top of a temp file with whitespace to keep the line numbers right, then validate that as CSS and JavaScript as well. (I’d rather find an offline way to do HTML validation that doesn’t involve Cygwin, as I have enough emulation going on here.) Perhaps I’d wrap it all in an NAnt task, or just an NUnit test suite that either reflected through the solution for web.configs or took in a list of projects via TestCase or the project’s AppSettings. (I’d like to be able to authenticate certain requests too, so I can validate the user profile content. WebCrawler.Settings exposes a Credentials property, though I’ve had more success setting an Authorization header than using HttpWebRequest.Credentials. Neither gets me through the forms authentication cookie though as UrlDownloader.WebRequestCreate() has no settings.Cookies. I’d love a per-url dictionary of “use credentials or don’t”, though I realize that’s totally overkill for the stock use of the SEO toolkit, and more than likely I can just rescan with a StartUrl inside the profile, and an ExternalLinkCriteria of SameFolderAndDeeper. If all else fails, I’d Reflector out the WebCrawler, or inject an override into UrlDownloader.OnGetContent() in the same dll.) More than likely, the report from each of the validators is too big for the build log, so I’d save off each report, named by download url and module, and build an index page dynamically to navigate through them all. After the report was run, I’d RoboCopy the report tree to a deployment url or find a way to include it into CC.NET’s document list. Wrap all that up in a bow, and we’ve got the uber-web validator CI engine. Now where’d I put that other 257 hours in the day? And will condensing a few weeks worth of random thoughts and links this tightly get me into the Google Dungeon?