"Out of band" work in an agile environment

"Out of band" work in an agile environment

Recently, I was talking to someone about how to prioritise and assign resource for out-of-band work: work that isn’t planned and managed according to your regular process, but rather appears out of the blue, whether a problem caught by monitoring or a misfeature reported by a user.

How do you make sure there’s someone available to do the out-of-band work? Some out-of-band work needs dealing with immediately, but some can wait. How do you decide which?

Why is this a problem?

It’s perhaps worth briefly discussing why this can be a significant problem for agile teams. Surely an agile team can cope with work that comes in, prioritising it against business needs (perhaps with the help of a business-side specialist)? Can’t self-organising teams figure this out?

The trouble is that many performant agile teams rely on the ability to focus, whether on a single piece of work for the few hours or days it takes, or a broader feature that chains multiple pieces of work together to deliver some concrete improvement to end users. Out-of-band work draws focus.

However a high-performing team can indeed figure this out, coming up with practices that allow them to respond to out-of-band work without significantly damaging their focus. They may like to start from these suggestions, and for teams that are still developing their abilities, perhaps these ideas can help as well.

Who should do out-of-band work?

There seem to be two options here: either you keep the out-of-band work within the team, or you put it in a different team.

With the rise of Lean Software Development, and particularly related movements like Devops, it feels strange to separate out anything that is maintenance work on your team’s product from the team itself; it runs against a number of the lean principles. However we’ll consider the implications of doing just that, as well as a halfway house where some subset of the team is always split off to work on out-of-band work.

Out-of-band as a separate team

You can have a team alongside the product team whose job it is to handle out-of-band work. This means that out-of-band issues don’t impact the product team’s velocity. However there are a number of significant drawbacks:

  • splits knowledge of the product between the team responsible for that product and a separate team — in particular, Conway’s Law may start to apply: your software may stop being the shape you need it to be

  • if an issue requires knowledge that the out-of-band team doesn’t have, it will still impact on the product team

  • career progression for the out-of-band team may be harder to manage

  • may create conflict counter to the Devops idea of having operational responsibility within the product team

There’s an exception where most of the out-of-band work is actually product support (for example helping configure a user’s system, or setting up a customisation). In that case it may be worth having a tier of support to handle this. Bugfixes to the product itself I’d still advocate go to the product team (even if the upstream support tier can propose those fixes, much as people outside an open source project can contribute proposed changes).

Out-of-band rotas

Each day, or week, or iteration, you can split off one or more team members to handle out-of-band issues.

This can work well, although many people don’t like doing it. In particular, teams or individuals that are worried about their velocity, rather than using it to guide them, may react against this way of working because it’s “not what we’re here for”. (This can be countered by providing a better idea of what the team actually is there for, encompassing the customer needs that drive doing out-of-band work.)

However explicitly and constantly changing the “shape” of the team in this fashion may well reduce cohesion and capability, particularly in a team with less experienced members, or where knowledge of some parts of the system is focussed in specific members. While it’s possible for someone on out-of-band duty to help and support those doing product development, this starts to look and feel a lot more like the next option.

Out-of-band agile practices

At the other end of the spectrum of having a separate out-of-band team, is for all out-of-band work to be handled by the team directly. While this may seem counter to some agile methodologies, which often strongly advise against introducing work mid-sprint, I’ve always viewed specific methodologies as a particular checkpoint on the path to ideal (although likely idealised and unattainable) agile working, where work flows through the team on a just-in-time basis. Providing each work item is also kept small, the team will have significant flexibility to pick up out-of-band work promptly.

But how quickly is prompt in this case? Should people drop what they’re doing as soon as a bug comes in?

How should we prioritise out-of-band work?

Some bugs and issues need addressing as soon as they happen. If the database behind your main product goes down, someone needs to jump on that. Other issues aren’t so urgent. For instance, if a report is sent to your finance team every Monday, and one week they notice that some of the subtotals aren’t correct, you have most of a week to fix it. (Providing the core figures are correct, finance folk are pretty nifty with Excel.)

When a new piece of out-of-band work is identified, whether by an alert from monitoring or exception tracking, a bug coming in from support, a regression against a preview version of an important web browser, or the disclosure of a security vulnerability from a dependency project or supplier, that work must be prioritised in order to help determine when it should be worked on.

In an ideal world, this would work the same way that any piece of planned work is prioritised, which depends on a view from whoever is responsible for product management decisions, guided by advice from other team members in helping assess the work. For anything that isn’t drop-everything urgent, you can manage that after raising it at daily standup, via a daily live bug triage session, or something similar that fits with your existing practices.

The aim is to give the product manager any information required to decide which out-of-band requests become work items that should be picked up in preference to planned work. Other issues that emerge can be planned in the usual fashion. (Of course, an issue may appear one day and not be considered a priority, then rise a day or two later if more, or more important, users run foul of it. You can still manage things in the same fashion, providing you keep on reviewing issues that haven’t been resolved.)

Some things that can influence priorities

The following are probably all relevant:

  • who does it affect? (stakeholders)

  • how much does it affect them? (stakeholder pain)

  • how important are they? (stakeholder power)

  • how long would it take to do the work? (work cost)

The first three are basically an approach to stakeholder analysis, which should be unsurprising given its importance in product management. Taken together with the last, these can then provide a way of determining both relative priorities within out-of-band work, and when compared to planned work.

Note that specialist knowledge in the team may be required to come to a good idea of stakeholder pain as well as work cost. For instance, a bug that has been reported against one browser but is not an issue on another may require a developer or QA engineer to evaluate against the range of browsers and your current user base to determine the level of stakeholder pain it is causing. (On the other hand, if the report came from a very important stakeholder, such as a potential new investor, it may be clear without digging so deep. There are no hard and fast rules as soon as people are involved.)

Note that you may need to consider other dimensions; for instance a security vulnerability that does not have a known practical exploit may require a consideration of risk. A scaling issue surfaced by monitoring may not (indeed, should not) be causing you problems today, and so its importance may be dependent on your growth forecasts for product usage.

Tracking out-of-band work

One of the general problems with out-of-band work is that while you’re likely to have reasonable practices for measuring things like the velocity of product work, you’ll have to put some effort into measuring the out-of-band stuff. However it’s important to do this, because you want to ensure two things: that you can act to reduce variance, and that your out-of-band work costs scale reasonably as your product usage grows.

A common approach is to track the work done after the fact. You could log time spent, estimate complexity or risk in story points, or just file a ticket for every piece of work. They don’t all give you the same visibility into the out-of-band work being done, but all are better than measuring nothing.

If you don’t have reliable and consistent tracking of out-of-band work, you can approximate it by looking at something like velocity per team member. The trouble is that although it will indeed be affected by the out-of-band work, that impact is entangled with a number of others related to team efficiency:

  • team size can impact communication efficiency, which will have knock-on effects on velocity per team member

  • in a small team, the calendrical variance in vacation taken can cause the same effects as changing team size (you can roughly control for the direct impact of size changes by averaging over team strength in days instead of all team members, but that doesn’t take into account the communication impacts)

  • environmental factors (noise and seating arrangements) can affect both individual performance, and communication within the team

  • distributed and home working, particularly if rare, can have both environmental and communication impacts

Summary

There are a range of approaches to tackling out-of-band work, which can be considered along an axis of integration with the product team. If you’re aiming for your team to release more frequently, to take complete ownership of its work, and to operate as a largely-autonomous, self-organising unit then you will want to aim for out-of-band work to be accepted and managed by the product team it relates to.

However it may not be possible to do that from where you are now, so some combination of the other approaches may be helpful.

Whoever ends up doing the work, it needs prioritisation like anything else. The authority for this rests in the same place as priorities for planned work, and indeed a lot of the same tools and approaches can be used to make priority decisions.

No matter how you choose to approach things, you should track the work so you can measure things that are important to you. You should also, of course, aim to review and improve your practices over time, both through regular internal retrospectives, and periodic independent assessments.