Why persisting incremental user states is hard

karthikramen

Karthik Sethuraman

Posted on March 30, 2023

Why persisting incremental user states is hard

This post, A reflection on building client-server state machines, was originally published on Dopt's blog.

Stateful user flows

Almost every modern SaaS product is made up of many experiences where users enter flows and take progressive, discrete steps which need to be saved and updated across multiple sessions.

For example, consider a user signing into a Slack workspace for the first time. To start, they see contextual multi-part onboarding about how that workspace functions; when they dismiss or complete certain guides, they don’t see them again; and when they encounter new areas of the workspace, they enter into new flows.

When the user’s (or the workspace’s) attributes change, they may now qualify for new flows: if I add more than 20 channels, Slack exposes helpers which let me organize and sort my channels; alternatively, once I upgrade my workspace to paid, Slack’s upsell empty states are dismissed.

Slack onboarding

Across all of these progressive user flows, a product needs to be able to persist, retrieve, and react to user state — when clients ask for flow updates and state transitions, servers need to respond by triggering, persisting, and returning state changes. When servers push state changes, clients need to reactively update to show the latest and most relevant steps in the user’s flow.

These experiences aren’t just designed and implemented once — they’re consistently added to, updated, migrated, and removed. Slack might change which users should see its channel organizers, or they may change their upsell states. They might add new contextual callouts as features are added and removed. All of these axes add additional dimensionality to an already multi-dimensional problem: now engineers need to worry not just about persisting incremental state once (or a few times), but also about persisting and managing a matrix of incremental states across users (and their workspaces) and experiences (and their versions).

These challenges may seem easy to solve on their own, but as I discovered when I was tasked with building this, solving them all together introduces entirely new scales of difficulty and complexity.

So why build these progressive discovery experiences if they’re so challenging to create and maintain? They’re very helpful for users. The research (1, 2) indicates that they are extremely successful at helping users navigate complexity; helping them successfully complete their tasks, and helping you activate and convert more of them.

Let’s dive into how we might create and evolve these incremental and stateful experiences as we’re building out a product.

Persisting incremental client-server state

About a decade ago at my first job, we launched a free product to improve demand and generate leads for our paid products. A week before the launch, my product manager hastily pulled me into a conference room and asked me how long it would take to build a multi-step tour that would help new users better understand the product. Let’s try to build that here.

What we want: a multi-step tour which users can start when they land in our product. Users should be able to leave the tour if they’re not interested. Users should also be able to reset the tour at any point from an entry point on our settings page.

Tour

I created a Tour client-side class consisting of simple Step objects (adapted to typescript below for easier consumption). I specified the Step objects statically in my code (in a JSON file, where I listed out the titles, descriptions, and image URLs). When I saved the state, I persisted only a few key variables per user: the activeIndex corresponding to the active part of the tour, whether the tour was started, and whether the tour was completed. Voila, I shipped a simple, fully-incremental linear tour for our product.

type Step {
  title: string;
  description: string;
  imageUrl: string; // sometimes a gif 🎉
}

class Tour extends Model {
  // defined statically in code, the list of steps in this tour
  // I defined this in a JSON file and used the JSON file when initializing the tour instance
  steps: Step[];

  started: boolean;
  completed: boolean;

  activeIndex: number;

  get active() {
    return this.activeIndex > 0 ? this.steps[this.activeIndex] : null;
  }

  next: () {
    this.activeIndex++;
    if (this.activeIndex >= this.steps.length) {
      this.activeIndex = -1;
      this.completed = true;
    }
    return this.save();
  }

  start: () { 
    this.activeIndex = 0; 
    this.started = true; 
    return this.save(); 
  }

  stop: () {
    this.completed = true;
    this.activeIndex = -1;
    return this.save();
  }

  reset: () {
    this.completed = false;
    return this.start();
  }

  initialize: async (userId: number) {
    // fetch the tour state from our server
    const tour = await super.fetch('tour', { userId });

    this.activeIndex = tour.activeIndex;
    this.completed = tour.completed;
    this.started = tour.started;

    // initialize the steps from our statically defined JSON file
    this.steps = parseSteps('./tours.json');
  }

  save: () {
    return super.save(
      'tour', 
      { 
        userId: getUserId(), 
        started: this.started, 
        completed: this.completed, 
        activeIndex: this.activeIndex 
      }
    );
  }
}
Enter fullscreen mode Exit fullscreen mode

Choosing how to persist incremental state

The save function in the Tour model above could push tour states into many different places — I seriously considered two leading options: keeping those values client-side in localStorage or sending them back to our web server and persisting them in our backend store, postgres.

At first, localStorage seemed pretty enticing: I wouldn’t need to manage extra tables in our database or add new APIs. However, it came with major drawbacks as well — I knew that we would eventually have to iterate on our tours. Shipping a localStorage module client-side and subsequently managing and migrating stale local tour state on the fly as tours evolved seemed way more challenging than managing those things server-side where we already had established and well-tested patterns for updating and migrating state. Taking the localStorage direction also meant that a user clearing their browser data or changing computers would lead to them encountering a tour multiple times.

So, I ended up creating a simple postgres table, tours, and I added a few wrapper CRUD APIs which would read from and write to this table. I then wired them into our client-side Tour model. Ultimately, both the short and long-term costs of creating these server-side endpoints and updating our database models were lower than if we would’ve shipped a more complex, localStorage backed client-side persistence module.

CREATE TABLE [IF NOT EXISTS] tours (
   id SERIAL PRIMARY KEY 
   user_id INTEGER NOT NULL,
   started BOOL NOT NULL,
   completed BOOL NOT NULL,
   activeIndex INTEGER
   FOREIGN KEY (user_id) REFERENCES users (id)
);
Enter fullscreen mode Exit fullscreen mode

Serving multiple incremental experiences simultaneously

As predicted, by week two, fresh off our successful launch, I already had a request to add a second tour, independently accessed from the first tour. This new tour would help users connect the product with their S3 storage.

Okay — I had to make these tours indexable by some name so that the client could fetch multiple different experiences and render them as necessary. I added a name property to the postgres table and updated all existing tours to be named getting-started-tour and named the new tour s3-tour.

ALTER TABLE tours ADD COLUMN name VARCHAR;
UPDATE tours SET name = 'getting-started-tour';
ALTER TABLE tours ALTER COLUMN name SET NOT NULL
Enter fullscreen mode Exit fullscreen mode

I also added the ability to fetch the tour by name and started maintaining multiple tour topologies statically on the client.

class Tour extends Model {
  name: string;

  initialize: async (userId: number, name: string) {
    // fetch the tour state from our server
    const tour = await super.fetch('tour', { userId, name });

    this.name = tour.name;
    this.activeIndex = tour.activeIndex;
    this.completed = tour.completed;
    this.started = tour.started;

    // use the tour corresponding to this name
    this.steps = parseSteps('./tours.json')[name];
  } 
}
Enter fullscreen mode Exit fullscreen mode

Not bad, I’d managed to spin up two different stateful experiences, and my model seemed extensible, at least for disjoint, linear flows.

Back then, most of our client was built with a mixture of Backbone, jQuery, and native javascript, and we structured it fairly imperatively. When a user completed an action (like clicking a button), a jQuery handler would trigger an event that would lead to the tour being progressed (tour.next). When the promise resolved, it would trigger the render of the subsequent, newly active step (tour.active) within the tour. As long as I kept our tour namespaces distinct and progressed them linearly, I could access, update, and manage multiple states successfully across the client and server.

Building non-linear, non-imperative state machines

The following quarter, we inked a partnership with Google Cloud, and I had new requirements to customize our connections tour (the s3-tour) to allow configuration of Google Cloud Storage (GCS).

Suddenly, I could no longer rely on a simple tour.next() to trigger an appropriate state transition — the transitions would now be driven via user choice. Also, the steps in each branch of the flow were radically different; for example, our GCS configuration required SSO, while our S3 configuration asked for secrets and keys.

I settled on a pattern that ultimately resembled something like a reducer:

function transitionTour({ tour, transition }) {
  switch (transition) {
    case '<transition-one>': // do something;
    case '<transition-two>': // do something;
    default: // do something;
  }

  tour.save().then(() => {
    tour.trigger('updated');
  });
}

$('#configuration-chooser').on('click', (e) => {
  const $element = e.currentTarget;

  transitionTour({ 
    tour, 
    transition: $element.attr('data-selected') 
  });
});
Enter fullscreen mode Exit fullscreen mode

Depending on a user’s choice, specific transitions were performed in the code. Once the transition was persisted, a state update event was triggered on the broader tour. Instead of immediately imperatively rendering a step, I wrapped our S3 and GCS configuration components in conditional renderers:

class ConditionalRenderer extends View {
  initialize: ({ id, tour, renderComponent }) {
    this.id = id;
    this.renderComponent = renderComponent;
    this.listenTo(tour, 'updated', () => {
      this.render();
    });
  }

  render: () {
    if (tour.active().id === this.id) {
      this.renderComponent();    
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Abstractly, the tours now functioned pseudo-reactively: user changes caused transitions and state persistence, and after those transitions, state changes were pushed to views which used them to evaluate whether they should render or not.

As my flows and business logic became non-linear, moving away from a click-this-render-that model to a click-this-cascade-changes model made building flexible views and consistent data models so much easier. We soon migrated all of our tours to behave in this state- and change-driven pattern. When we later migrated to React, this style of pseudo-reactive programming was extremely easy to transfer.

Decoupling client-side business logic from server-side database models

After I shipped the s3-tour (actually, while I was trying to get my pull request merged), a few other engineers wondered why I would want to build a state layer on top of backend properties that were already being stored in our server-side models. As a user configured their S3 connection, we were already storing values in a connectionconfigurations table which tracked the bucket’s name, their AWS key, etc. They argued that I should just have built a wrapper around these properties to judge how far along a user was in completing their configuration task.

Here’s the thing: it’s much easier to iterate on client-side business logic — that is, how a client-side experience functions and how it fetches and persists data with a server — when it isn’t tied at the hip with server-side properties. These two layers change at fundamentally different rates, and they have fundamentally different expectations — we expect the bucket name and AWS key to be fairly stable, and we use them repeatedly in our downstream connections API to send data over to a user’s S3 bucket. We don’t care too much about which step in the s3-tour a user is in — we can always reset or modify this state. Similarly, the s3-tour changes pretty quickly: we add states, remove states, combine configuration requirements into a single step, and so on, but our server-side model will always need to store the bucket name and AWS key.

This is not to say that a client-side state management layer couldn’t be derived from a set of static server-side properties. I actually believe that most incremental state layers begin this way. It’s just that this coupling hampers iteration: when I got around to adding GCS configuration steps to the s3-tour, most of the engineers I’d argued with saw the advantages of the decoupling we’d done. We did carry the cost of maintaining two sets of states, but this cost was worthwhile: the needs and expectations placed on those two sets of states were very different.

Iterating and versioning product experiences

Each time we updated or restructured a tour, as we’d done with the s3-tour, I’d manually go in and delete the rows associated with each user who had experienced the tour:

DELETE FROM tours where name = '<tour-name>';
Enter fullscreen mode Exit fullscreen mode

We updated the tours with such frequency that I created a wrapper function around this concept and checked in a data migration whenever a new version of a tour went out.

For some tours, I also received specific requests that we shouldn’t re-expose users but rather exit them from the tour. Rather than deleting those rows, I marked those rows as done:

UPDATE tours SET completed = true where name = '<tour-name>';
Enter fullscreen mode Exit fullscreen mode

I created a wrapper function around this concept as well. Within a few quarters, we had migrated our tours table upward of ten times as we iterated on our setup and onboarding, and our migration log was littered with instances of s3-tour-*-exit-users.js and onboarding-tour-*-reset-users.js. A few times, I even manually updated our production database to delete and update tour states for specific subsets of users and workspaces. Migrating, resetting, and managing these experiences soon became a part-time job.

Monitoring and analyzing state machines

I had been nervously waiting for the day when someone would ask me about how our tours were performing. At which steps did people get stuck? Did someone leave and never come back? Which new workspaces were getting setup correctly?

I had a pretty simple solution: whenever our tour’s state was updated on the server, I sent a track event to Segment. The body of the event was pretty flat:

{
  "tourName": "s3-tour",
  "index": 3,
  "title": "configure IAM access limitations", 
  "completed": true,
  "userId": 1,
  "email": "karthik@dopt.com",
  "workspaceId": 10,
  "company": "Dopt",
}
Enter fullscreen mode Exit fullscreen mode

From Segment, we forwarded these events to Amplitude. After confirming the events were in Amplitude, I usually directed all further conversation to our product team, who helped with analyses (and who sometimes bothered me to add more fields into our Segment events).

Admittedly, I had become fairly frustrated with building and maintaining these tour state machines, but these events were useful for the cursory analyses we raised earlier. We were usually able to answer questions around individual and workspace level success, understand the performance of individual steps, and see where we might need to intervene, either by changing our experiences or by talking to customers directly. The biggest pain point with analyzing the tour events and user funnels was in constructing and maintaining reports as the tours evolved: we had to manually keep multiple systems in sync. On several occasions, we forgot to do so, resulting in stale and unusable data and core growth metrics.

On the engineering front, I usually debugged our tours manually, not by using Segment or Amplitude, but by creating test users and directly monitoring and resetting their state in our postgres instance. This was a pretty crude and time-consuming way of iterating on our experiences, but I didn’t have any time or capacity to create more robust internal tooling around working with these experiences.

How Dopt can help

Would you be surprised if I told you we built Dopt to solve these sets of problems?

This is going to sound like a pitch, but we really did encounter all of these problems at Trifacta in 2014 when we were rushing to ship our first freemium product. I joined Trifacta right around their Series B and was tasked with building Trifacta’s initial experiences (in a rush to get things out the door), and my PM was pestering me about extending those experiences, making them richer and more complex, and tracking user success and early activation.

Every SaaS product can benefit from introducing progressive product flows that educate and orient users. Every SaaS product has to tackle this problem of persisting and managing incremental user state. State inevitably gets split and decoupled between clients and servers, and flows invariably grow and become more unwieldy as the product evolves.

We built Dopt to give you a toolkit that lets you build and iterate on these flows faster. You model your user flows in Dopt, and you access and progress their states directly in your app using our SDK (without ever touching your database models). You don’t need to figure out how far along a user’s setup is by querying some ever-changing set of tables — you can just use a Dopt hook instead (1, 2).

Dopt lets you manage, monitor, and debug these flows too: you can test against sandboxed environments, track how users are progressing through flows, migrate users between versions, and reset flows—all without needing to run ad-hoc queries against a production database.

If you’re currently building flows in your SaaS product that could benefit from more context about your users and what they’ve already seen and done — like a set up wizard, new user onboarding, feature call outs, in-product announcements, checklists, or product walkthrough — sign-up for our beta or check out our docs.

💖 💪 🙅 🚩
karthikramen
Karthik Sethuraman

Posted on March 30, 2023

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related