Meditations on concurrency
Sean Williams
Posted on July 26, 2022
My business is auto repair, which has been a bit of a journey from computer science. On the other hand, auto repair is a surprisingly software-reliant field. I don't just mean this in the sense that cars have more computational elements in them, though that's an issue too.
Many business could be run with just paper, though it would be burdensome. Auto repair has a few special problems, though.
First, job times are standardized. Nearly every job that could be done to nearly every car has a benchmark time it should take a competent technician to complete. This means, if your shop is legitimate, the price of a job should be: the book time for the job * the shop's labor rate + the cost of parts required to do the job * (1 + the shop's parts markup) + nominal overhead.
Second, every car is different, often in bizarre ways. In the old days there were big service manuals for every year for every make and model—these still exist, but in practice they've been displaced by software. Even if you can figure out how to do a job on your own, you often need specs—how much to torque down a bolt, how much fluid to add, that sort of stuff.
Third, if you're good enough to do electrical diagnosis and repair, you need wiring diagrams.
Yes, auto repair preexists computers, but cars were a lot simpler in those days. Not surprisingly, the currently available software for running an auto repair shop—isn't great. Our biggest problem is one of concurrency: Our current software can be run in a client/server mode over a local network. When a work order is up on one computer, other computers are fully locked out.
This has led to me thinking a lot about how concurrency should be handled in this situation, since I'll soon start writing our own shop management software.
Race conditions
The principal issue is what we call "race conditions." Imagine you have a shared variable x
, initially 0
, and five threads all run x++
. What happens? It's tricky because the definition of x++
looks more like this:
load x into register1
add 1 to register1
store register1 into x
The final value of x
could be anything between 1 and 5, depending on how instructions interleave. For example, just looking at two threads, you could have:
thread 1: load x [r = 0]
thread 1: add 1 [r = 1]
thread 1: store x [x = 1]
thread 2: load x [r = 1]
thread 2: add 1 [r = 2]
thread 2: store x [x = 2]
In this case, x
is 2. The instructions could also play out like this:
thread 1: load x [r = 0]
thread 2: load x [r = 0]
thread 1: add 1 [r = 1]
thread 2: add 1 [r = 1]
thread 1: store x [x = 1]
thread 2: store x [x = 1]
In this case, x
is 1.
This opens up into a whole lot of other stuff: mutual exclusion (mutex) locks, semaphores, transactions... You introduce special processor instructions, like "atomic check and set," which can do
if(x == 0) {
x = 1;
return true;
}
else {
return false;
}
in a single cycle. This lets you start doing stuff like,
int lock = 0;
void foo() {
while(!atomic_check_and_set(lock)) {
sleep(100);
}
// Do some stuff, knowing that other threads won't interfere
lock = 0;
}
This requires a lot of diligence to get right; it's very easy to hang your program by improperly handling locks. This is also why focus has shifted to "safer" things like promises and work queues.
There's one other problem with locking that's worth mentioning. Locking should have some level of granularity, where there are different locks for different data. What if a thread needs three locks? More importantly, what if multiple threads need overlapping subsets of locks? One thing you absolutely should not do is:
int lock1 = lock2 = lock3 = 0;
void foo() {
while(!atomic_check_and_set(lock1)) sleep(100);
while(!atomic_check_and_set(lock2)) sleep(100);
while(!atomic_check_and_set(lock3)) sleep(100);
// do stuff
lock1 = lock2 = lock3 = 0;
}
If there's another function that looks like this:
void bar() {
while(!atomic_check_and_set(lock3)) sleep(100);
while(!atomic_check_and_set(lock2)) sleep(100);
while(!atomic_check_and_set(lock1)) sleep(100);
// do stuff
lock1 = lock2 = lock3 = 0;
}
you can run into a situation where a thread executing foo
takes lock1
, while a thread executing bar
takes lock3
. One of them will then take lock2
, but at this point both threads will hang indefinitely.
Time scales
The issue exposed by our shop management software is the problem of time scales. Holding a lock for 200 ms is fine in smaller-scale environments. Holding a lock for eight hours will piss people off. This is the difference between "batch processing" speeds, versus user interface speeds.
The obvious lesson is that you should avoid holding locks while waiting for user input. Among other things, you don't know if the user is even at the computer!
The above discussion about locks is apparently nowadays called "pessimistic concurrency control." The alternative that gets suggested is "optimistic concurrency control," in which you allow concurrent memory access to proceed but roll back or merge in case of conflicts.
This approach makes sense in situations with very high utilization. At the smaller scale, especially a smaller scale where the users are auto mechanics, I think it's a bad idea to drag in merge semantics. I mean, programmers themselves already have significant heartburn with Git merge—subjecting auto mechanics to merging sounds inhuman.
Live updates
I decided to approach this problem, not as a programmer, but as a normal person. Nearly everything in computing is some sort of metaphor for real life: there's a reason the base of most graphical operating systems is called a "desktop," and that "files" are stored in "folders." Even the older term "directory" is—well, for the youngsters out there, you used to use directories to look up phone numbers.
The main reason for this is, you ought to try and make computing intuitive. That means having a handle on intuition: what will people find intuitive? Conflict merging in the real world is incredibly tedious, and made up of judgment calls and "who has final authority?"
What about the more basic question: what are real-world situations in which we have concurrent access to a shared medium? I realized, that happens all the time, it's so common that it's nearly invisible. Namely, having a conversation. People manage conversations not through complicated technical rules, but through—for lack of better words—courtesy and willpower. You and I talk at the same time. Do I stop speaking, do you? If we both stop, who starts talking? If we both keep talking, who blinks first?
For the purposes of concurrent access, these questions don't need to be answered—it suffices that people manage conversations, and we call that management "social convention." This made me realize, smaller-scale concurrent access can be handled with live updates, like Google Docs.
That is, commit changes as they're made, and have those changes immediately reflected to other users. Yes, users can create race conditions, but this isn't a technical problem—it's a social problem. Users will instinctively see the possibility of race conditions (without knowing that term) because they'll see what other users are working on. Prosocial users will avoid conflicting changes; antisocial users can be disciplined.
There are some algorithmic problems with doing this well, and I'll post about them later. You still need to do three-way merging, for example, and you need to keep the text input cursor in a reasonable place. But this solution does bring locking back to the batch processing timescale, in a way that ought to be intuitive to normal people.
This is a work in progress, of course, but I think it's interesting enough to be worth posting about.
Posted on July 26, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
November 13, 2023