Slide: 1
I'm going to talk today about a bunch of issues.

But, the heart of the talk concerns testing the content of views generated by a Rails app. I found I had quite a lot to say; and that I'm not enough of a speaker to improbise without missing key points, so I've decided to work from a script. So, you can expect me to rattle through plausible sounding material whilst not looking you in the eye, and occasionally losing my place and looking around vacantly as I try to remember what it is I'm actually talking about.

Slide: 2
In particular I'm going to speak up for a testing pattern that is often assumed unworkable, and try to get it rehabilitated.

Before I get to the concrete details, in order to help my case I'm going to give you some context, and various disclaimers, about where I'm coming from. Also I'll briefly go all abstract, and make some woolly philosophical assertions about the nature of testing. So, in an attempt to keep you interested I'll give you the controversial, over-simplified straw man version of the pattern right now, so that you can get appropriately riled up and indignant about how this luddite approach "isn't suitable for real world applications and clearly isn't going to scale." [TM]

Basically, instead of doing this...

I'm suggesting doing this...

Slide: 3
So, onto setting the scene.

I'm going to use old world testing technology; the narrow focus I take on testing view content in Rails will be very narrow; my wide focus for where you can apply the underlying pattern is very wide; including venturing outside of Ruby and Rails.

Slide: 4
Firstly: I'll be talking about tests rather than examples and specifications. This is basically because I don't want to cut out those people who aren't experts at behavior driven development and with its terminology - and not least amongst those people is myself. I don't think this will hurt. The approach I'll be looking at doesn't conflict with BDD in any way; it has orthogonal concerns and should survive translation across to that well-behaved, exemplary world.

Slide: 5
I'm going to concentrate on testing view content. I'll assume as a starting point that we can already test things that aren't fundamentally view content in fairly pleasant ways. I'll take for granted...
...that the business logic behind what we're viewing is in unit tested models,
...that fancy logic of how things get presented is in unit tested helpers,
...that the control logic of what you can do by interacting with the view is in functionally tested controllers,
...and that the emergent story of the whole application is covered by integration tests.

I do want the coverage of my content tests to descend as far as is useful into the DOM - to pin down the tags, and to mark out the mark-up.

(At the end I'll explore whether we can squeeze more value out of the pattern by spilling back into some of those other testing areas.)

Slide: 6
Further, I believe some of the ideas explored extend usefully outside of web page content, and into other domains where we want to build a large object, with a complex structure, whose details we care about a varying amount.One such domain that I've worked in was a medical visualization application, where in different contexts we had to build a bunch of different screens, each of which contained a set of images, and each of those was a bit like a Photoshop session, accepting input from a variety of direct manipulation controls and mouse-driven selectable tools. (Sorry I couldn't find clearer pictures.)

A high-level text-based yaml-like description of the screen content (again, I've lost my examples) contains about the same amount of pertinent detail as the DOM of a view in a typical web app. We used these descriptions in our testing.

Slide: 7
Along another conceptual axis... I also believe that like any decent pattern, this one works fine in frameworks other than Rails, and in languages other than Ruby. The medical visualization thing was a Windows desktop app written in C++; and I've tried out aspects of contentful view testing in a Java Spring MVC application.

However, I've explored the most, and stretched the ideas the furthest, when writing spare time apps in the framework that this conference is all about. And, I've made the pattern concrete (and, if the gods are with me, demonstrable) in the wonderful sandbox for experimenting with different development approaches that is the Rails plug-in.

Slide: 8
A little more personal context and history... the plug-in in question grew directly out of need. I had put together a little CRUD+search+CMS app for that most demanding of all enterprise clients - my mom - in the mission critical area of finding out which of your family ancestors were corkcutters.

Slide: 9
Why do we test? For the purpose of this discussion, I identify three overlapping but distinguishable uses of tests in software development.

Firstly, tests can be used to maintain known good behavior, preventing accidental bad changes (also known as "regressions".) The tests provide a safety net during refactorings, and enforce a Hippocratic "first, do no harm" oath during other work.

The test stays green whilst all is well. A red bar tells us that we've messed up and broken something.

Secondly, tests can be used to drive new behaviors - sometimes replacing existing behavior, more often adding in new behaviours. The tests provide context to an deliberate, good change. The archetypal example is classic TDD, creating tests first in order to define the problem, and to guide the design and implementation of a solution.

New tests flash red and are put back to green by introducing production code.

A third and less often discussed use of testing, is to explore and validate emergent behavior. In a complex system it can be hard to prejudge or predict every knock-on effect of a deliberate change to one part of that system. Tests can provide a context for understanding and controlling these kind of changes and consequences.

In this way of working, changes cause flashes of red that are observed and then often approved, and we gradually update expectations to recolor things back to green again.

Slide: 10
So, let's look at how people tend to test view content in Rails. The conventional approach is to generate the view in a functional test, and to use assert_select to check the DOM details of interest.

Aside... any demo I do is going to involve the usual conflict between wanting to be done against an example that isn't too trivial, but that doesn't need you to be distracted by its particular details. I've compromised on the depot example from the skateboard book. This is classic assertion-based testing. Assisted by the Rails controller testing framework, we create the response, and check whichever properties we care sufficiently about.

Slide: 11
There are refinements to this that decouple the testing of the view from that of the controller.  ZenTest and Rspec view examples basically let you define the assigns a template renders from more directly. This allows you to write a unit test rather than a controller-based functional test. If that kind of decoupling test practice floats your boat, I understand why - it's how I was brought up, too. I find that in this particular case I value the integration/system test benefits more than the test independence, but I'll leave an examination of that for later in the talk.

For now, my main point here is that however you set things up, once you've got your content, you apply the same kind of selective asserts about particular parts of it. Now, assertion-based tests are prevalent in contemporary developer testing, with good reason. Given effective investment - we need to express the properties that we care about succinctly, and at an appropriate level of abstraction within the domain - assertion tests tend to work well. And assert_select is a pretty good example - a best of breed solution in the domain of assertion testing view content.

Slide: 12
But, assertion tests have some disadvantages that happen to be particularly acute in the domain of testing the generation of large complex structures like views.

Assertion tests are poor for my third type of test use, that of exploring change. The developer has to run around rebalancing the books, fixing up many individual test expectations to agree with the emergent consequences of their change.

On the other hand, assertion tests are extremely good for test driving. But, test driving is less useful than usual when you're building views. I find that view content tends to come into existence in a concrete form, designed or composed in toto rather than being built up one abstract essence at a time. I'll usually go to some trouble to write things test-first, but for view content I'm happy enough to test after the fact, once I've checked how the view looks in the browser.
With respect to the remaining of my three testing usecases: assertion tests are also usually well-suited to catching regressions. You've said what you care about, so you get told when what you care about breaks. However they are not so good a fit as usual in the domain of view content, basically because there's so much to care about, and changes are too frequent.

You have to decide in advance what it is you care about - coverage using assertion tests is necessary selective. We can only capture the behaviors that we think to test for - but, the importance of many properties of view content are not obvious until we've accidentally broken them: Aesthetics can be damaged by small inconsistencies. Any attribute or value typo could stop your javascript DOM processing from working. A new child element with slightly the wrong style might break a CSS layout. Even extra whitespace between tags might introduce an extra text element that changes the appearance of the view in the browser.

The other problem is that you have to say what it is you care about. It takes too much code to express every desirable property explicitly; translation, even into a well-designed and succinct language, still takes time. And every assertion that you do add is one more thing that can break due to a deliberate change and need individual attention to be set right again. 

Slide: 13
These kind of issues tend to discourage thorough view testing.

Here's Jay Fields of Thoughtworks expression of a common opinion about the pain of view content asserts: "I find that the View tests that RSpec and Zentest provide are fragile and provide dubious value. Rarely do they find bugs and often they break for unimportant reasons. I just don't find them to have a good return on investment."

When I Google around for opinions on testing view content on Rails, the "do as little of it as you can get away with" sentiment comes up over and over. (There's a deliberately user-generated err.the_blog article that drew a good sample of responses.)

So the common Rails response to this problem of assertion testing view content turns out to be: don't do it. Or, at least, assert *very* selectively.

Slide: 14
Here's a canonical Jamis Buck blog on the subject. To summarize: you can't make your view tests tight without turning them into a maintainence nightmare and a disincentive to change anything. Therefore, you should pick out what you really truly care about very carefully, and express it at a high-level that elides the structural details.

But we don't have to run away from the details. My position in this talk is that if we take a different tack then we can push down enough on the cost of covering the details that we can make a profit on the extra value we get from paying attention to them. Having noticed that the cost to express and update assertions about every aspect of content is high, let's instead take advantage of the fact that the most concise - and the most easy to produce - summary of all the properties of some piece of content is the content itself. This prompts me to take a step backwards, into the land of classical regression testing.

Slide: 15
So let's go all the way back to my straw man, the classical regression test. 

When we're happy with the current output, we capture a copy of it, and we use that as  the expected output to check against in the future tests. Usually the output is large and textish, and so it gets stored in a file.

What are we going to gain? Well, it is clearly pretty cheap to express in code. And, it's cheap to create and update the expectations for any one testcase: you only need to copy and paste of the current output, you don't have to do any translation into the domain-abstracted language of assertions.

Those are the properties of regression testing that mean it could be good for testing view content. But without further work, this regression is literally a step backwards relative to assertion testing your view content when it comes to ease of maintainence.

Upkeep of regression testing is very expensive if we behave naively. The tests are really brittle and fragile: comparing against a complete text copy means that we'll see every change, even ones that are "just syntax" that leave the DOM the same. Every diff will need explicit attention.

And, the diffs themselves will be very noisy. Because there is no abstraction, one little code change can break many tests. Similar differences will need attention in multiple places.

These workflow costs multiply rapidly as the system under regression test grows. And this is why the conventional wisdom is that regression testing is archaic, dumb, and the poorer cousin of "proper" domain-abstracted assertion-based testing.

So, my straw man is on fire at this point - even in the domain of view content, regression tests will prove too costly if we do not do some further work to make them less so.

Slide: 16
The good news here is that it's significantly easier to take steps to reduce the pain for regression tests than it is to fix up the corresponding problems with assertion tests. So let's do that work, and turn regression testing into what I term contentful testing.

We'll use two approaches. Firstly, we'll try to lessen the noise.

Information can be defined as any difference that makes a difference. The diffs that don't are the noise in regression tests that we want to reduce.

So contentful testing should ignore insignificant changes, like the diffs in HTML mark-up syntax that won't affect how the content is displayed. We can ignore them by allowing a little abstraction back in: we can still use exact equality checks and standard diffing operations on text content, but we'll be comparing unique normalized text representations of DOMs.

And, contentful testing should let you DRY up which parts of content gets tested. Rather than see the same diff to the same repeated parts of content over and over, we need to be able to focus our tests on those parts of content that vary across our testcases, and test only once the stuff that many testcases have in common.

Secondly, in good automation fashion, we will make the common tasks in the workflow as cheap and as easy as we can. It has to be simpl eto create a new regression test,to inspect diffs from the expected content when changes occurand to accept the changes once they have been checked over and seen to be OK.

And finally, this needs to work well in batch. We should be able to generate test suites, and detect, review and accept changes in for many testcases at a time, when need be.

Slide: 17
So, here's my Rails plug-in attempt at this pattern of contentful view testing.

[Next 13 slides are a sequence of screengrabs from a demo]

Slide: 18
The core of Contentful is the assert_contentful method mixed into Test::Unit::TestCase. When called without arguments from one of your functional or integration tests, this assertion checks the rendered HTML page content (in @response.body) against expected content stored as an expected.html file on disk.

Slide: 19
When we run the assertion for the first time, it passes - and captures the expected content from the current content, as a side effect.  It makes a little bit of noise in the test output so you know it happened. In accord with Rails' culture of convention over configuration, we locate the expected content in a standard place, derived from the name of the test case and method.

When expected content exists, then the assertion compares it with the current content using HTML::Nodes equality, which normalizes such things as case, spacing and quoting within the mark-up. It also does a little Rails-specific scrubbing of asset id timestamps and the like to ensure that different working copies play nice together.

So the basic idea is that you do your regular functional tests, covering whatever you usually would with explicit assertions, and get a bonus complete content check by additionally slipping in this assert_contentful. 

Slide: 20
Here's what happens when we cause an accidental regression.

(I change a closing tag from an h3 to and h4, and re-run the test.)

You'll see that as a side benefit, since we have to parse the mark-up in the content in order to compare it, we get some bonus validation.

Slide: 21
Let's fix the DOM but keep a different accidental diff in the content.

This time the mark-up validates, but the assertion fails because of our change.

When the assertion fails, then the changed content is written out as a temporary changed.html file. 

Slide: 22
We also write out temporary expected.to_diff and changed.to_diff normalized versions of the DOM content with some added line-breaks to make them more suitable for comparison with a line-based diff program.

If there was a decent HTML-aware diff around, I'd use that to compare the html files directly, but I can neither find one nor motivate myself to write it. So instead I use these normalized forms as the glue between very easy-to-diff plain text files, and the DOM.

Contentful has suggested a diff command-line to inspect the change, based on my local preferences for DIFF.

Slide: 23
We revert the "accidental change" and re-run the test.

When the assertion passes, any existing temporary files are removed.

Slide: 24
I'll go further into reviewing diffs and playing with these files in a bit.

First I want to mention a few probably unsurprising extensions to using assert_contentful.

If you want to check more than one view content in the same testcase - for example, in a story/integration test - you can supply a Symbol for assert_contentful to use as a label. If you use two assertions in the same test, Contentful will complain unless you supply labels to distinguish them.

Slide: 25
Labels are prefixed to the associated separate content files.

Slide: 26
By default contentful is looking at the entire @response.body. We can focus on a particular part of the response through the select_contentful assertion. So to avoid duplication across a set of contentful tests, you can focus on a particular subset of the content using a CSS selector. This allows us to ignore components that are common across all our views, such as navigation sidebars and the like.

I'll use this test as an example, selecting only the part of the content within the "div#main".

Slide: 27
You can run a suite of all the tests for which Contentful has expected content using a rake task.

Slide: 28
So that covers using Contentful to make assertions and notice regressions.

The next thing is support for reviewing and accepting intended and exploratory changes - the bit that hurts if you use too much assert_select.  So suppose we make some change that will show up as multiple areas of discontent. (In the demo I'll only break a couple of tests, but I hope you can see that the process presented scales fine and would work just as well if it were twenty changes.)

As well as reading test output, we can use the filesystem to get an overview of what's broken.

Slide: 29
We can also see all the current changes using a rake task:Sometimes the text output from a command-line diff is just what you want. If a set of changes are good, then often we can tell this by casting an eye over the diffs and observing a plausible pattern. 

Slide: 30
If the number of changes is sufficiently large and their nature is sufficiently similar, you can cast something more thorough over them, such as a grep.

I find that sometimes a graphical diff/merge utility works well: the visualization can help in detecting reassuring patterns, and the tools usually offer ways to only accept selective diffs.

Slide: 31
Usually though once you're happy the changes are good, you can use another rake task to accept them.

You can make use of some other shortcuts too. For example, if you run them down in a particular subdirectory, then the Contentful rake tasks will limit themselves to what's under there.

Or, since expected content is generated when absent, we can alternatively use the file system to our advantage, and update expectations by removing particular subdirectories and then re-running tests.

Slide: 32
The particulars of how Contentful automates common workflows made for a good example, but I'd prefer to sell you on the approach than the plug-in.

My central points are that you should capture some suitably normalized text representation of what you've built, amd implement a little tooling to make it easy to manage the common activities: this makes all the difference to the viability of content testing.

I can imagine many alternative implementations of the pattern even in Rails.

Outside of our favorite framework, even more so. In a previous life on C++ projects we made some use of make to run tests and accept changes in batch; and I've also gotten good results from more visual approaches: by writing tiny shell and Perl scripts that run tests, and that diff/accept changes, and then associating these with the expected/changed content files, you can integrate these activities into your regular file explorer view - multiple select is a good way to quickly select a particular batch of changed content to re-run, review or accept.

Slide: 32
32

Slide: 33
A few words on pushing contentful testing further - both in the sense of selling you some more of its benefits, and that of playing with the boundaries of where it can be used.

I started using content tests as an experiment. They've come to be a tool I find useful to wield in increasingly more places.

Now, even when you invest in making them easier to use, there is definitely a trade off here between coverage and maintainence: not every spot is sweet enough to make the deal worthwhile. But because I've become interested in how often they can be useful, I tend to push content tests quite hard (both at play and at work) in order to see where they break. And my experience has and continues to be rather less soon than I expect. And I find them creeping into areas where I'd previously happily just use assertion-based tests and adding value where they weren't supposed to.

The dense coverage is a large part of this: contentful testing allows for the serendipitous discovery of changes I didn't think to consider. I find this has happened a lot, and in all the different domains in which I've applied it.

As mentioned previously, contentful tests work well at a time when assertion-based ones are expensive: when you're exploring your domain in ways where each rule or policy change can cause lots of ripples in the expected output.

Recently I was hacking around on a hobby project to summarize news feeds that I like to keep an eye on but find are too busy: the app pulls in a crowd of items and squashes them into one daily summary, a bit like an old-school e-mail digest.

Now when I started, I wasn't sure what role each RSS tag would play, what the summarizing policy should be, and I completely punted on deciding what the key concepts where and how they should split between M, V and C. So I had a mess of code and rapidly changing fine details in the structure of the output feed. I found that contentful tests assisted in the exploration of the space, and then once I was ready to firm up the concepts and factor out proper models from fat controllers, they kept the gestalt system behavior unbroken whilst I did the large refactorings.

I think another reason I find contentful tests leak value into places I didn't expect to be using them is the fact that it's very easy to supplement an existing test with a single assert_contentful and get lots of bonus coverage. They're quick to put in; and if they stop adding value, they're easy enough to take out again. This breaks test independence somewhat, but nothing makes me happier than an end-to-end test that takes in real inputs, has total coverage of real outputs, and is relatively cheap to maintain.

Slide: 33
33

Slide: 34
So I really like my content tests to be a permanent part of the test suite. I hope I have interested some of you enough to embrace this approach full-on.

However, I'm going to end by changing tack and showing a final usecase that hits enough the sweet spots that I think its hard to argue against employing contentful tests: using them as a temporary testing vise when you're doing a big content refactor.

The concept of a coding vise is due to Michael Feathers. You temporarily add machinery (such as sensing variables) to lock down a behaviour that you wish to preserve, perform a pervasive/risky refactoring, and then remove the machinery before checking your changes in.

The Contentful plugin in particular can serve as a very powerful vise for changes such as extracting common partials and helpers from views, or refactoring your form builders, or moving from one templating framework to another. These kinds of change can break your content in a variety of unexpected ways.

You can protect yourself very cheaply, pinning down every aspect of the content generated in all the existing tests by adding the following to your config/environment.rb This is basically equivalent to adding an assert_contentful to every single one of your functional tests.

So you then run tests once to capture all the existing content, make your changes, observe, fix or accept the consequences, and remove the Contentful vise when you're done. The expected content never needs to be checked in.

Slide: 34
34