How to speed up your code with Multithreading

sunnyb

Sun-Li Beatteay

Posted on February 3, 2020

How to speed up your code with Multithreading

If you prefer visual learning, here's a presentation I gave at a local meetup about multithreading (skip to 6:10). Otherwise, ignore the video and continue on with the post.

There come times in a programmer’s career when they realize that their code simply needs to run faster. Whether it’s creating low-latency APIs or programs that parse through billions of data points, speed is a huge factor.

What can you do when your code simply runs too slow?

One viable choice is multithreading. In order to talk about multithreading, it’s important to discuss what processes and threads are. In the simplest term, you can think of a process as an executing program.

For example, run ps aux in a terminal to see all the processes currently running on your computer. All of those processes correspond to a program or application. The web browser that you are reading this on is using one or more processes.

Processes running on my computerProcesses running on my computer

You can think of a thread as a worker for the process. If the process is the boss, then the threads are the faithful employees. Each process initiates a single thread but can create more if needed.

All threads that are within one process share the same heap memory but contain their own execution stacks. This means threads can share data but not function calls.

Credit: [https://workerholic.github.io](https://workerholic.github.io)Credit: https://workerholic.github.io

The reason why multithreading is useful is that an executing program can delegate tasks to many different threads. It would be equivalent to an employer hiring fifty programmers to build an entire SaaS product instead of just one. The best way to show this is with an example.

Note: while the examples in this post use Ruby, multithreading is language agnostic. I just like Ruby.

Let’s say that you have a program that needs to loop through a list of tasks and complete them. In this example, let’s say that each task takes about 1 second to execute. If you relied on a single thread to perform this loop, it would take 10 seconds to complete.

# 01_serial_loop.rb

def output(num)
 sleep 1
 puts num
end

start = Time.now

(1..10).each { |num| output(num) }

puts "#{Time.now.to_f-start.to_f} seconds to complete."
Enter fullscreen mode Exit fullscreen mode
$ ruby 01_serial_loop.rb

1
2
3
4
5
6
7
8
9
10

10.02504587173462 seconds to complete.
Enter fullscreen mode Exit fullscreen mode

However, if we were to give each individual task to a different worker, it would only take 1 second as all of the workers would be working concurrently. In Ruby, this is as simple as creating a new instance of Thread.new and passing it a block to execute. Each Thread.new returns a new instance of a thread.

However, it’s not enough to just spin up new threads. The program has to make sure those threads finish before exiting. If a program doesn’t explicitly wait for the threads to finish, the program will exit too early.

In Ruby, the join method has to be called on each thread instance so that each thread joins back with the main thread.

# 02_multithread_loop.rb

def output(num)
 sleep 1
 puts num
end

start = Time.now

# Create all of the separate threads
threads = (1..10).map { |num| Thread.new { output(num) } }

# Call "join" on each thread instance so it merges back with main thread
threads.each(&:join)

puts "#{Time.now.to_f-start.to_f} seconds to complete."
Enter fullscreen mode Exit fullscreen mode
$ ruby 02_multithread_loop.rb

5
4
8
6
10
2
1
9
3
7

1.0051839351654053 seconds to complete.
Enter fullscreen mode Exit fullscreen mode

Concurrency and Parallelism

One thing you may have noticed from the concurrency example is that the threads returned in a random order. Threads don’t finish in the same order in which they were initiated. They are asynchronous. This is the main concept behind concurrency and parallelism. Each thread initiates and finishes at its own pace without regard for the other threads.

While concurrency and parallelism aren’t the same, they are similar. The difference between each concept is outside the scope of this article but can be read about here.

One important thing to note is that due to Ruby language constraints — notably the Matz Ruby Interpreter(MRI) and *Global Interpreter Lock(GIL) *— the standard version of Ruby is incapable of true parallelism. However, it can do concurrency just fine.

Note: The MRI was used in Ruby up until version 1.8. Since version 1.9, the default interpreter has been YARV (Yet Another Ruby VM)

The MRI and GIL prevent parallelism in standard Ruby versions. Credit: [https://workerholic.github.io](https://workerholic.github.io)The MRI and GIL prevent parallelism in standard Ruby versions. Credit: https://workerholic.github.io

Race Conditions

Concurrency is important to understand as it can lead to hard-to-fix bugs if you aren’t careful. One such bug that can arise is a race condition.

The most common form of a race condition is when multiple threads try to access the same piece of memory at the same time. The problem with this scenario is that many operations can be ignored, or worse, the memory could get corrupted.

Let’s look at an example where many threads are trying to increment a global counter. We start off the counter at 0 and create 1000 threads, all with the task to increment the counter by one.

# 03_global_counter.rb

count = 0
threads = []

1000.times do |i|
 threads[i] = Thread.new do
  count += 1
 end
end

threads.each(&:join)

puts "count = #{count}"
Enter fullscreen mode Exit fullscreen mode

If all goes well, the counter should end up at 1000. But remember, each individual thread is operating on a single variable or piece of memory. With so many different operations occurring on a single piece of state, data corruption can occur. Let’s see what happens in this scenario.

$ ruby 03_global_counter.rb

count = 1000
Enter fullscreen mode Exit fullscreen mode

Voila! It worked! But why? So many operations occurring on a single piece of state will surely corrupt it, right? Not always.

In this case, the operation that each thread had to execute was so fast that by the time the next thread was created, the first thread had finished. There simply wasn’t enough time for multiple threads to overwrite each other.

But what if that wasn’t the case? What if each thread had to do some work, such as an API call, before operating on a piece of state? We can simulate that extra work by having each thread sleep for a random amount of time before incrementing the counter. This will prevent the threads from finishing instantaneously.

# 04_sleeping_threads.rb

count = 0
threads = []

1000.times do |i|
 threads[i] = Thread.new do
   # Each thread sleeps for a random amount of time
   # This simulates extra work (e.g. API request)
   sleep(rand)
   count += 1
 end
end

threads.each(&:join)

puts "count = #{count}"
Enter fullscreen mode Exit fullscreen mode
$ ruby 04_sleeping_threads.rb

count = 1000
Enter fullscreen mode Exit fullscreen mode

“Hm,” you may be thinking. “Looks like it still worked. I don’t think this guy knows what he’s talking about.”

Don’t get too comfortable. The reason why this example worked is that the standard Ruby runtime isn’t capable of parallelism. By default, threads will only execute one at a time on a global variable. With the standard Ruby interpreter, you don’t have to worry too much about race conditions.

But what if you weren’t using the standard Ruby runtime? What if you weren't using Ruby at all but a language capable of parallelism, like Java? Luckily, we can coerce Ruby into allowing parallelism if we use a different runtime. In this case, JRuby.

**# Change ruby version to JRuby**
$ rbenv local jruby-9.1.10.0
$ ruby 04_sleeping_threads.rb

count = 864
Enter fullscreen mode Exit fullscreen mode

Yikes. Looks like we’re missing 136 counts from our counter. That means that at one or more points, multiple threads overwrote each other.

Since JRuby runs on the Java Virtual Machine and is capable of parallelism, our program is vulnerable to race conditions. In terms of thread safety, it is very unsafe.

Preventing Race Conditions

For those of you unfamiliar with race conditions, you may wonder how to prevent them. “There has to be a way of making our programs deterministic and thread-safe, right?” you may ask. Fortunately, there is. We can use something called a Mutual Exclusion Object. Mutex, for short.

A mutex will ensure that only one thread can access a piece of memory at a time. Using a mutex in Ruby is very easy. All you need to do is create a new instance of the Mutex class and wrap the vulnerable code in a synchronize block.

# 07_thread_safe.rb

count = 0
threads = []
mutex = Mutex.new

1000.times do |i|
 threads[i] = Thread.new do
   sleep(rand)

   # Use mutex to prevent multiple threads from accessing counters simultaneously
   mutex.synchronize { count += 1 }
 end
end

threads.each(&:join)

puts "count = #{count}"
Enter fullscreen mode Exit fullscreen mode
$ ruby 07_thread_safe.rb

count = 1000
Enter fullscreen mode Exit fullscreen mode

Multithreading isn’t always the best solution

The trade-off of using a mutex is that your program will run slower than without it due to threads having to wait. While a longer running program is better than an inaccurate program, using a mutex may defeat the purpose of multithreading. Programs that rely heavily on accessing global variables may not benefit from multithreading. It may be better off running serially.

Too much context switching can lead to poorer performance. Credit: [https://workerholic.github.io](https://workerholic.github.io)Too much context switching can lead to poorer performance. Credit: https://workerholic.github.io

Also, tasks that must be executed in a specific order would not be suited for multithreading. As we saw earlier in this article, threads do not finish in a deterministic order.

An Ideal Use Case

While multithreading may not be appropriate for every situation, there are many where it’s perfect. One example is when your program has to make multiple requests to fetch data, whether from an internal service or a third-party.

Imagine you were creating an API endpoint that returned data about a user’s account. In the response is data that your company neither owns nor manages, such as GitHub Repos. To get a user’s GitHub information, the program will need to issue many requests to GitHub.

The requirements of this API are that the data must be accurate and the latency has to be less than a second. The faster, the better.

In the following example API, the program pulls all the repository names from a file (to simulate a request to a database). The program then loops over each name and makes a request to the GitHub API to retrieve more data about it.

require 'sinatra'
require 'httparty'

get '/slow_response' do
  start_time = Time.now
  token = ENV["GITHUB_API_TOKEN"]
  repos = JSON.parse(IO.read("data/repos.json"))

  res = repos.map do |repo|
    owner, repo = repo.split("/")

    response = HTTParty.get(
      "https://api.github.com/repos/#{owner}/#{repo}",
      headers: {
        "Authorization" => "Bearer #{token}",
        "User-Agent" => "sunny-b"
      }
    )

    data = JSON.parse(response.body)

    {
      owner: owner,
      repository: repo,
      stars: data["stargazers_count"],
      forks: data["forks_count"],
      watchers: data["watchers_count"],
      language: data["language"]
    }
  end

  JSON.generate({
    response: res,
    time: Time.now.to_f-start_time.to_f
  })
end
Enter fullscreen mode Exit fullscreen mode

As you might imagine, this can lead to long response times for the user.

$ curl localhost:4567/slow_response

{
 “response”: [
 …
 ],
 “time”: 7.479549884796143
}
Enter fullscreen mode Exit fullscreen mode

In this example, it takes more than 7 seconds for retrieve the user’s data. This isn’t acceptable. We need to get this to under a second so that it can scale and provide a better user experience.

One possible way of shortening this response time is by caching all the data in a database. The problem with this solution is that this data changes often. The title, number of watchers, number of stars, and forks. These values are all subject to change, which would make the results inaccurate. Incorrect results aren’t allowable.

Another tool to consider is multithreading. Instead of making API requests to GitHub one at a time, a separate thread could be spun up for each request.

# sinatra_demo_fast_response.rb

require 'sinatra'
require 'httparty'

get '/fast_response' do
  start_time = Time.now
  token = ENV["GITHUB_API_TOKEN"]
  repos = JSON.parse(IO.read("data/repos.json"))

  res = repos.map do |repo|
    # Create a separate thread for each API request 
    Thread.new do
      owner, repo = repo.split("/")

      response = HTTParty.get(
        "https://api.github.com/repos/#{owner}/#{repo}",
        headers: {
          "Authorization" => "Bearer #{token}",
          "User-Agent" => "sunny-b"
        }
      )

      data = JSON.parse(response.body)

      {
        owner: owner,
        repository: repo,
        stars: data["stargazers_count"],
        forks: data["forks_count"],
        watchers: data["watchers_count"],
        language: data["language"]
      }
    end
  end

  JSON.generate({
    # Retrieve return values of each thread
    response: res.map(&:value),
    time: Time.now.to_f-start_time.to_f
  })
end
Enter fullscreen mode Exit fullscreen mode
$ curl localhost:4567/fast_response

{
 “response”: [
 …
 ],
 “time”:0.5799121856689453
}
Enter fullscreen mode Exit fullscreen mode

By changing 3 lines, we dropped the response time from over 7 seconds to under a second. It’s possible to make this code even more performant by enabling parallelism.

$ rbenv local jruby-9.1.10.0
$ ruby sinatra_demo_fast_response.rb

# In a new terminal tab
$ curl localhost:4567/fast_response

{
 “response”: [
 …
 ],
 “time”:0.3163459300994873
}
Enter fullscreen mode Exit fullscreen mode

With parallelism in place, the response time can be cut to a third of a second. Our users would be quite happy with that performance.

I hope that you gleamed some new insights about multithreading. Concurrency is a language agnostic tool that’s excellent for optimizing your code. Most languages support multithreading or multiprocessing in some fashion.

The next time you find yourself needing to create a fast program, consider multithreading as a possible solution.

💖 💪 🙅 🚩
sunnyb
Sun-Li Beatteay

Posted on February 3, 2020

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related

How to speed up your code with Multithreading
productivity How to speed up your code with Multithreading

February 3, 2020