Volatile, DCL, and synchronization pitfalls in Java
Anastasiia Vorobeva
Posted on June 5, 2024
Author:Konstantin Volohovsky
What if common knowledge is actually more nuanced, and old familiar things like Double-checked locking are quite controversial? Examining the code of real projects gives this kind of thought. In this article, we'll discuss the results of the examination.
Introduction
Not so long ago, as part of my work routine related to checking open-source projects using PVS-Studio, I checked the newly released 24th version of the well-known DBeaver project. I was pleasantly surprised by the quality of its code — the fact that developers use static analysis tools doesn't go to waste. However, I kept digging and found some suspicious code fragments that caught my eye. They were so conspicuous that I've decided to dedicate an article to each of them. So, welcome to the first part of the series.
Thread-unsafe programming
What am I talking about?
Parallel programming has many pitfalls. Of course, not only in Java, as its evil (or good :) ) twin brother C# can easily make you stumble when using events in parallel scenarios. Asynchronous and multithreaded errors are often the most difficult to catch.
Below, we'll look at three types of such errors. The first one is the non-atomic change of volatile variables.
What's volatile again?
The documentation tells what the volatile keyword does:
- It prevents each cache from being used by different threads. This ensures that changes to a variable are immediately visible to all threads.
- It sets a happens-before relationship between a variable and read/write, thus prohibiting optimizations based on changing the order of operations.
You can see how this keyword works with a simple example of a thread waiting for a variable to change:
private static volatile String str;
public static void main() throws InterruptedException {
new Thread(() -> {
while (str == null) { }
System.out.println(str);
}).start();
Thread.sleep(1);
str = "Hello World!";
}
If we remove volatile, the loop becomes infinite. At least in my case. The result can be highly dependent on the JVM and hardware.
If we change the example a bit by adding a bound variable, the happens-before mechanism becomes clearer:
private static String str;
private static volatile boolean flag = false;
public static void main() throws InterruptedException {
new Thread(() -> {
while (!flag) { }
System.out.println(str);
}).start();
Thread.sleep(1);
str = "Hello World!";
flag = true;
}
If we had only a visibility guarantee, the compiler or processor could reverse the order of assigning operations to the str and flag variables. Then we'd have a chance to get into System.out.println(str) when str is still null.
However, the happens-before principle ensures that the write that precedes the write in the volatile variable remains where it's supposed to be.
Race
Even though we have variable visibility, it's certainly not enough to get parallel threads to operate properly:
private static volatile int counter = 0;
private static void increment() {
Thread.sleep(1);
counter++;
}
public static void main() {
var threads = Stream.generate(() -> new Thread(Main::increment))
.limit(100)
.toList();
threads.forEach(Thread::start);
while (threads.stream().anyMatch(Thread::isAlive)) {
Thread.yield();
}
System.out.println(counter);
}
Such code always returns a number less than 100. This isn't surprising, since the increment operation isn't atomic, and one thread can squeeze through the other. This can happen between the operations of taking the old counter value and incrementing it by 1. There can be two fixes, one easier than the other:
Let's use special data types that provide a ready-made interface for working from multiple threads. In this case, it's AtomicInteger.
private static final AtomicInteger counter = new AtomicInteger(0);
private static void increment() {
Thread.sleep(1);
counter.incrementAndGet();
}
Let's just add the synchronized keyword (or the synchronized block). This ensures that only one thread can perform operations in this method.
private synchronized static void increment() {
Thread.sleep(1);
counter++;
}
Case study
If you thought that this is basic knowledge and nobody would make the above mistake in real projects, then... you're almost right. I found three fragments in DBeaver where the volatile variables of primitive types are handled. And, as it often happens in real life, the cases aren't so simple. A dangerous situation may occur in some of them, but for the most part nothing terrible can happen. Nevertheless, here are these three fragments:
public class MultiPageWizardDialog extends .... {
....
private volatile int runningOperations = 0;
....
@Override
public void run(....) {
....
try {
runningOperations++; // <=
ModalContext.run(
runnable,
true,
monitorPart,
getShell().getDisplay()
);
} finally {
runningOperations--; // <=
....
}
}
}
The PVS-Studio warnings:
- V6074 Non-atomic modification of volatile variable. Inspect 'runningOperations'. MultiPageWizardDialog.java(590)
- V6074 Non-atomic modification of volatile variable. Inspect 'runningOperations'. MultiPageWizardDialog.java(593)
We have two warnings for the price of one, and it's also the only fragment of code where there's a tangible risk. It would seem that we have a reference case that is almost identical to the artificially created example above. However, if we take a closer look at the name of the class and the method where everything happens (MultiPageWizardDialog and run, respectively), it becomes clear that we're dealing with UI. It's very unlikely, if at all possible, that a real race condition can occur here.
However, if we imagine that the run method would be executed simultaneously by multiple threads, the value of runningOperations could easily be incorrect.
private volatile int drawCount = 0;
....
private void showProgress() {
if (loadStartTime == 0) {
return;
}
if (progressOverlay == null) {
....
painListener = e -> {
....
Image image = DBeaverIcons.getImage(
PROGRESS_IMAGES[drawCount % PROGRESS_IMAGES.length]
);
....
};
....
}
drawCount++;
....
}
The PVS-Studio warning:
V6074 Non-atomic modification of volatile variable. Inspect 'drawCount'. ProgressLoaderVisualizer.java(192)
Looks like the least dangerous code snippet: drawCount is used as a progress image selector, providing a cyclic scrolling index. Firstly, it seems that the cost of making a mistake in such a code fragment isn't the highest. Secondly, the very fact of accessing drawing from multiple parallel threads sounds questionable. If the latter assumption is correct, then the volatile modifier is redundant here.
The analyzer issues a warning for the following code:
private volatile int initializedCount = 0;
....
public CompareObjectsExecutor(CompareObjectsSettings settings)
{
....
initializeFinisher = new DBRProgressListener() {
@Override
public void onTaskFinished(IStatus status)
{
if (!status.isOK()) {
initializeError = status;
} else {
initializedCount++;
}
}
};
....
}
The PVS-Studio warning:
V6074 Non-atomic modification of volatile variable. Inspect 'initializedCount'. CompareObjectsExecutor.java(130)
This fragment looks more dangerous then the others. For now, however, we lack any context, so we need to look at where initializeFinisher and initializecCount are used:
this.initializedCount = 0;
for (DBNDatabaseNode node : nodes) {
node.initializeNode(null, initializeFinisher);
....
}
while (initializedCount != nodes.size()) {
Thread.sleep(50);
if (monitor.isCanceled()) {
throw new InterruptedException();
}
....
}
This is what we have:
- the counter is reset;
- its increment is triggered in node.initializeNode;
- next, all nodes are checked for being initialized. Otherwise, the thread sleep occurs for 50 milliseconds.
It would seem to be a reference case: if two threads increment a counter at the same time, all logic breaks. Right? No, there are no threads in node.initializeNode. There's nothing at all in there :)
public boolean initializeNode(
DBRProgressMonitor monitor,
DBRProgressListener onFinish
) throws DBException {
if (onFinish != null) {
onFinish.onTaskFinished(Status.OK_STATUS);
}
return true;
}
Anyway, while DBRProgressListener is indeed used in parallel scenarios, this isn't the case. For initializedCount, volatile just seems redundant, as does the loop that checks the number of initializations.
Unexpected twist
Earlier, we looked at simple examples with primitive types. Is it enough to mark the reference type field as volatile to ensure similar behavior? Nope. You can learn more here. By modifying the previous example a bit, we can easily get an infinite loop again:
private static class Proxy {
public String str;
}
private static volatile Proxy proxy = new Proxy();
public static void main() throws InterruptedException {
var proxy = Main.proxy;
new Thread(() -> {
while (proxy.str == null) { }
System.out.println(proxy.str);
}).start();
Thread.sleep(1);
proxy.str = "Hello World!";
}
One possible fix is to mark the str field as volatile, and then everything will work as before. As for the proxy field, there's no point in marking it in any way.
private static class Proxy {
public volatile String str;
}
private static Proxy proxy = new Proxy();
public static void main() throws InterruptedException {
var proxy = TaskRunner.proxy;
new Thread(() -> {
while (proxy.str == null) { }
System.out.println(proxy.str);
}).start();
Thread.sleep(1);
proxy.str = "Hello World!";
}
So, is there a point in marking the reference field as volatile? This is where we finally come to the Double-checked locking pattern.
Double-checked locking
Why?
This pattern is used as a high-performance way to implement lazy initialization in a multi-thread environment. The idea is quite simple: a normal check is performed before the "heavy" check in the synchronized block. The control flow continues only if the requested resource hasn't been created yet. The implementation seems really simple:
public class HolderThreadSafe {
private volatile ResourceClass resource;
public ResourceClass getResource() {
if (resource == null) {
synchronized (this) {
if (resource == null) {
resource = new ResourceClass();
}
}
}
return resource;
}
}
The pattern is most often used to create singletons and for delayed initialization of heavy objects. Is it really that simple, though?
Is it an anti-pattern?
This pattern has a rather amusing feature. It doesn't work. At least if you're about the same age as dinosaurs, and you're using a Java version lower than 5. Stay tuned, though! In fact, this is a key point in our story :)
So, what has changed in JDK 5? This is when developers added the volatile keyword. Now, you can immediately catch me out and say that in the example above, the resource is marked as volatile. However, if I were you, I'd take a step back and look at the code. What exactly is the issue here? Here's the order of actions:
- resource is checked for null. This point can be traversed by multiple threads simultaneously;
- The synchronized block is accessed. Only one thread can get in there:
- in the second check, the first thread that did this makes sure that resource isn't created yet and then creates it;
- The other threads that passed the first check access the synchronized block one at a time:
- they see that resource is already created, and they go out of there;
- Everybody's satisfied and happy.
Why do we need volatile if we use synchronization to control the access to the field? The thread value visibility isn't an obstacle here, really, but we have two issues:
- the operation of creating an object isn't atomic (there are no issues with primitive types, except for the 64-bit ones);
- by default, both the compiler and lower-level systems are allowed to override the operation order.
This means that the consequences of such optimizations can be unpredictable. For example, the compiler can inline the constructor and change the order of field initialization in any way possible. As a result, an object may be marked as created. Another thread sees this, starts using it, and finds uninitialized fields there. The happens-before principle described above saves us from this. Thanks to the volatile variables, double-check locking works since the fifth version of Java.
So, is it a pattern after all?
What made me delve so deeply into the theory? It was another analyzer warning that has been issued for the next method of the DBeaver project:
private List<DBTTaskRun> runs;
....
private void loadRunsIfNeeded() {
if (runs == null) {
synchronized (this) {
if (runs == null) {
runs = new ArrayList<>(loadRunStatistics());
}
}
}
}
V6082 Unsafe double-checked locking. The field should be declared as volatile. TaskImpl.java(59), TaskImpl.java(317)
It's funny that the pattern collection I mentioned earlier lists the following drawback:
Complex implementation can lead to mistakes, such as incorrect publishing of objects due to memory visibility issues.
Obviously, this is exactly what happened in this case :)
Let's return to the question I asked in the theoretical part of the article:
So, is there a point in marking the reference field as volatile?
Back then, I cheated and deliberately omitted the obvious thing. When we change an object field, we don't touch the variable containing the object reference in any way. However, when the variable is overwritten, all information specified in that section becomes relevant again. So, even with the aforementioned limitations (a volatile field doesn't guarantee the safe publication for the members of the object it references), there are use cases, and this is one of them.
Or is it, though?!
It seems that the fix in this case is obvious: just mark the runs list as the volatile variable. This resolves the issue. However, I suggest that you remember the purpose of the pattern:
The Double-Checked Locking pattern aims to reduce the overhead of acquiring a lock by first testing the locking criterion (the 'lock hint') without actually acquiring the lock. Only if the locking criterion check indicates that locking is necessary does the actual locking logic proceed.
So, we have a pattern that's designed for optimization. What could be the issue here? Here's a hint: do you remember my comment about dinosaurs? :)
This pattern was born a long time ago, when synchronized methods were very slow. Today, both JVM and hardware performance have advanced significantly. So, the risk of such errors makes it expensive to use this (micro)optimization. If we talk only about singletons, initializing them as a class field is sufficient for the thread safety:
class Holder {
static SomeSingleton singleton = new SomeSingleton();
}
In other cases, we can just use synchronized methods, unless everything else is already optimized.
By the way, since I've mentioned C# before, I'll do it again. Funny enough, the exact same issue exists there too, as we've written about before.
Another unexpected twist
Should we use synchronized methods? What if the developers are already doing this? In that case, the volatile field would simply be redundant. It's used in 20 code fragments, let's look at a random three:
void addNewRun(@NotNull DBTTaskRun taskRun) {
synchronized (this) {
loadRunsIfNeeded();
runs.add(taskRun);
while (runs.size() > MAX_RUNS_IN_STATS) {
runs.remove(0);
}
flushRunStatistics(runs);
}
TaskRegistry.getInstance().notifyTaskListeners(....);
}
Okay, we immediately found a fragment where it's used four times, but there's also a synchronized block here...
private void loadRunsIfNeeded() {
if (runs == null) {
synchronized (this) {
if (runs == null) {
runs = new ArrayList<>(loadRunStatistics());
}
}
}
}
Here we have three more for the price of one and synchronized again. Have I really just been messing with your heads all this time?
@Override
public void cleanRunStatistics() {
Path statsFolder = getTaskStatsFolder(false);
if (Files.exists(statsFolder)) {
....
}
if (runs != null) {
runs.clear();
}
flushRunStatistics(List.of());
TaskRegistry.getInstance().notifyTaskListeners(....);
}
Gotcha! If we follow the uses of this method, there's no synchronization anywhere above. So, we've found what we've been looking for. By the way, this fragment isn't the only one.
What was the need for this performance if I could just show you a snippet without synchronization? This is just another potential vulnerability. Its presence alone can cause a race condition in addition to making possible the consequences of a double-check locking error.
So, it'll take a little more effort to fix that code fragment, because while we were analyzing one bug, another bug was found.
Conclusion
I hope you enjoyed diving into the wonderful world of parallel programming pitfalls with me. Many people claim that if you've ever had to use volatile variables or the double-checked locking pattern, you've already done something wrong. Or you're an expert and you knew exactly what you were doing :)
Still, I think this note sums up the content of the whole article pretty well: the rabbit hole is deep, indeed. At least it's deep enough to be twice as careful when dealing with the topic of the article.
If you'd like to search for those or other errors in your project, you may try PVS-Studio for free by following this link.
I'd also like to remind you that these aren't all the interesting things you can find in DBeaver, — more articles are coming soon. I'll add links here as soon as they're released. In the meantime, I'd recommend subscribing to my X-twitter, so you don't miss their release.
Posted on June 5, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
November 29, 2024