Feature flags piloting & how to ruin it

There is a semi-sophisticated thing we do in managing SaaS in production environment that's not given an proper name yet. For the lack of better word I'll resort to calling it feature flag piloting. I reserve the right rename it in the future.

It's a great solution to a very specific set of problems. I'm here to explore what happens when you take it too far.

What is feature flag piloting

Say you have a spanking new feature, it's going to impact everyone when deployed. This is potentially risky and dangerously irreversible.

What you can do instead is to have the feature turned on selectively. This is done by implementing feature flags within your code logic. When the flag is on, the new feature executes. Feature flag can be global or user-specific; a global one affects everyone and is not selective.

Advantages

With this mechanism you can adopt a handful of beta users to test out this feature. If mistakes happen, damage is contained and fixes can take place before a wider roll out. Existing non-beta users are happily cruising in your SaaS without feeling any impact.

This is effectively changing the aircraft engine in mid air. Whatever you can do to cut down your risk, you should do it and feature-flagging is a great way forward.

Doing this is more than safety too. It allows feature-development to be agile, giving you the room develop quickly, release for beta-testing, get feedback, make corrections and repeat the loop in a rapid manner without the downside of receiving widespread complaints.

But what if you take it too far?

In what ways can this be taken too far? Here's a few:

Beta-testing keeps running and never finishes
Features flags are never removed
Too many feature flags are running concurrently
Inter-dependent feature flags (god no don't do that!)

While you're reaping the benefits of this approach, debt accrue. Over time the price comes in these forms:

Involving many humans in operations

Sometimes feature flags are self-managed, a simple checkbox for user to turn on. Sometimes that's too dangerous to even expose. As such turning on feature flags for pilot users involves getting in touch with support staff, backend engineers, maybe even product managers.

In the chain of operations multiple emails get passed, unnecessary human mistakes get to be made. Multiply that by the amount of beta users across time, this cost of manhour adds up.

Software is about taking out humans from the loop in the first place. This is a complete failure in that regard.

The UI is no longer the source of truth

Very often feature-flags are used for presenting different user interfaces to beta users.

Beta user sees something different from other users. Over time there are two sets of realities (or more) for different sets of people. They may all be looking at the user-profile page (for instance) but are experiencing something entirely different.

Now you can no longer count on the UI alone to tell us what is supposed to happen. There needs to be another parameter (the feature flag) to tell us the expected behavior. This becomes a problem that needs additional product-management manpower to managed.

Imagine a sole pilot user who is the only one getting this new feature. He's been left alone using this, six months later he contacts support for help. Support looks at his screenshot and got shocked, the user-profile (example) page is nothing like what he has seen before.

Difficult to debug

Effectively you have expanded the products surface area. Double the area, double the potential for bugs.

When old version and new versions (via feature flag) are running at the same time, you have doubled the maintenance cost while serving the same amount of users.

The default bug report is no longer enough to tell you if this new feature is generating the problem.

Cost spent are wasted

When a new features is only as impactful as its exposure. By limiting it to only beta users, you may never make back the cost of development.

Secret internal knowledge

This whole approach is like ordering off-menu in a high class restaurant. It's cool for the few people involved, but it benefits only them.

Most internal staff don't know about features that are long hidden in beta phase. When some users request for it, some staff won't know to ask to turn it on.

How to manage it better

Here's a few suggestions on what can you do if you have to have feature flags:

Limited beta time window

Don't let it run forever. Set a time limit, maybe three months, maybe two weeks.

When the time is up, evaluate, release it widely and deprecate accordingly.

Set conditions to exit pilot phase

Perhaps a fixed time window isn't good enough. Define your own condition on what constitute satisfactory conditions before you feels safe releasing the feature to everyone.

Maybe it's a collection of pilot users of diverse profiles. Define their characteristics, let them beta-test for a set amount of time (just not forever) then evaluate.

Maybe you want to define it statistically. Above 85% success rate where no bugs are reported within a period of a week, then exit the pilot phase.

Wider release but partial

To further hedge the bet, you may opt to exit pilot phase by releasing only to half the population of users (who didn't opt in as beta users).

The engineering cost of this is highly subjective, just know that it's an option.

But same point, eventually this has to end and complete release ought to happen.

Conclusion

There may be better ways to manage a pilot phase while being in the middle of it, that's outside the scope of my concern here. In the long run it's better to not stay in it than having two versions of the same product running at the same time (A/B test doesn't count).

As far as management tactic goes, it's a simple matter of setting an alarm to go off at a certain date telling you to re-evaluate a specific feature.

Any good thing can be taken too far. It doesn't invalidate the approach, just be mindful of the limits.

Blog