April 8 2011

The Case Against Behaviour-Driven Development

Test-driven development is now an accepted good practice in software development, particularly for teams practicing some form of Agile methodology. However it doesn’t provide any kind of framework for business stakeholders and developers to easily communicate requirements using a shared vocabulary. Agile methodologies do give this, but exist somewhat externally to the workflow of development itself. Behaviour-driven development was devised with this goal in mind, centred around the idea that the functionality you are building in your software is actually behaviour.

In a Rails environment, behaviour-driven development typically consists of using a tool like Cucumber, which requires features be described in business-friendly English. In order to actually bind this feature description to real functionality, step definitions have to be written in Ruby — this is the real meat of any test, and where the real problems start.

Like test-driven development, BDD requires that tests are written before any other code and cannot ever be broken. This gives us two choices: either the design of the application must be complete (and never changes) before BDD tests can be written, or the tests will need to be constantly rewritten as the design changes. The latter choice potentially leads to tests which describe functionality never intended to be released. This really ought to be a warning sign for anyone familiar with TDD, as the only valid reason for changing a test is if the purpose or function of that code has changed.

Test-driven development assumes individual testable units of code are black boxes: you don’t care how they are implemented internally provided they meet the criteria set out in the test, which describe how it should function and respond to certain conditions. By contrast, since BDD tests can’t magically understand a Human-readable action such as click the button associated with the fifth row in this list they need some knowledge of the structure of the page they pertain to — i.e. they violate the Law of Demeter.

Typically pattern matching is used to find the appropriate element to interact with — be that a text input, a link or a submit button. Again there are two choices for how to do this matching, both of which have negative implications. One approach is to have very tight coupling within your test to the specific markup the test expects. Naturally this means that even small changes to that markup will break the test. The other approach is to use extraneous markup (using HTML5 data attributes, for example) which can be unambiguously bound to the test, but this means that you’re essentially building software whose only purpose is to satisfy tests.

In addition to the coupling problem, the content of pages can itself become the keystone of test failure; something as innocuous as changing the capitalisation of a submit button can cause tests to fail. I’ve tried to find examples of how you incorporate BDD into an internationalised site and have yet to find anything, suggesting that in this scenario you would need to duplicate tests for each locale where actions and expected results are delivered in multiple languages.

In a typical development workflow with BDD it is not unusual for these kind of issues to result in temporarily broken tests, as markup is refactored or copy is changed. It would be highly unusual to start out with an XPath selector for your intended markup and then work backwards to the code you end up using, not least because when dealing with the vagaries of cross-browser testing — not to mention a design that may still be in the process of being refined — some experimentation and iteration is likely. If tests are regularly broken (and then changed to suit the code they’re testing) then it begs the question of what purpose they really serve.

One of the principal motivations for test-driven development is the confidence it gives a developer in the reliability of the application. BDD should theoretically give you a similar level of confidence when it comes to user interactions. Unfortunately current tools for BDD only support RESTful actions. That might have been appropriate for web applications in the previous decade, but the landscape has changed, and rich interactions are commonplace — yet wholly unsupported by the toolchain. If your tests only cover the RESTful actions that would be performed by a visitor with a JavaScript-disabled browser, then how can you have any confidence that your progressively enhanced interactions are functional and reliable?

To be blunt, you can’t.

My take from all this is that behaviour-driven development is not fit for purpose when developing web applications. In attempting to solve a problem subject to all the fuzzy logic that Human interactions encapsulate, it instead creates more developer overhead without actually increasing confidence and reliability. As an alternative I would propose a far more pragmatic approach, to replace BDD with a looser, Human-driven process where features remain described in business-friendly language, but are tested by a fallible Human, not an inflexible machine.