Shining some light on timing out tests

A couple of days ago we had an interesting case of our CI constantly failing on one particular branch pipeline. The code change was trivial. So trivial that we eventually replaced it with an empty commit. The pipeline was still failing.

We use ember-exam to shuffle the test order with pseudorandom seed. The idea is:

branch_name=`git rev-parse --abbrev-ref HEAD`
ember exam --random=${branch_name}

If you never encountered this, very good reasons for doing so are:

Tests running in always the same order will probably not reveal potential race conditions.
Tests running in always the same order will probably not reveal possible leaking state between test runs.
By using seed generated from a branch name, we can probably replicate problematic scenarios locally.

Why did I emphasise probably so much? Because as one great Dilbert comics says:

👨🏼‍💻 Are you sure that's random?
😈 That's the problem with randomness, you can never be sure.
-Scott Adams-

Situations like ours might signal that there is registered test waiter that does not get cleaned up after the component lifecycle ends.

This is hard to debug because the leak can originate in any tests that preceded the one that timed out. So how to find the offending code?

We had luck with using Chrome -> Dev Tools -> Performance tool. We simply let the tests run till they started timing out. At that point, we hit "record" for few seconds and tried to dig through the results. Few moments going through the stack tree and there is a hint pointing at ember-test-waiters.

So we dig into the waiter-manager::hasPendingWaiters code and found more details about which waiter, that was not expected, we were waiting for. After that, a careful examination of the relevant component revealed case when the waiter might not have been unregistered.

I like this technique because it records history and allows you to quickly jump between the things that already happened without worrying too much that you will miss something important. It's a great first approximation.

Code example

From the perf tool, one can simply jump to the relevant code line, put a breakpoint and check what is the code complaining about. The code we found could be simplified to:

export default Component.extend({
  actions: {
    buttonClicked() {
      this.animate();
    }
  }

  animate() {
    waitForPromise(new Promise((resolve) => {
      AnimationLibrary.animate({
        onComplete: () => {
          resolve();
        }
      });
    });
  }
});

There might be multiple problems with this. But for simplicity let's say that one test calls click('button') without await and immediately after that the test ends. There is a chance that component gets torn down while AnimationLibrary.animate is running. But since there is no DOM to interact with anymore, it will never call onComplete(). Thus leaving the next test hanging with a waiter that will never resolve.

Or there might be other situations happening. Does not matter. The important thing is that you make sure you cleaned up after yourself when your component lifecycle ends:

export default Component.extend({
  // Register fake cancel function
  _cancel() {},

  actions: {
    buttonClicked() {
      this.animate();
    }
  }

  animate() {
    waitForPromise(new Promise((resolve) => {
      // Override _cancel function
      this._cancel = resolve;
      AnimationLibrary.animate({
        onComplete: () => {
          resolve();
        }
      });
    });
  }

  willDestroyElement() {
    // Make sure that the promise is resolved
    this._cancel();
  }
});

Async is hard.
-every developer ever-

Blog

Shining some light on timing out tests

Michal Bryxí

Code example

Join Our Newsletter. No Spam, Only the good stuff.

Related