How To Use DispatchGroups To Gather Information From Multiple Sources At Once
Kenny Dubroff (froggomad)
Posted on October 23, 2020
Before We Begin
It's helpful, but not required to know:
- Basic Swift
- Object-Oriented Programming (OOP)
- Asynchronous programming
URLSession
What We'll Be Doing
Let's say you're creating a simple catalog of Car Brands. For each brand, you want to know what makes and models the brand has available. You have an API with the information and it's in JSON. In Swift, we might model something like that in this fashion:
struct CarBrand: Codable {
let id: Int
var makes: [CarMake]
}
struct CarMake: Codable {
let id: Int
let name: String
var models: [String]
}
Then if we were so fortunate, we'd make one API call, get an array or two of associated data, and get all of the information we want. Ah, bliss...
This time, things wouldn't be so easy...
Linking information from decoupled sources.
I recently ran into a problem where I had to do something similar in a totally unexpected way. I was downloading information from a single source, but for some reason, relationships couldn't be established properly at the parent endpoint and we had to access a pivot table where the relationship between the two objects existed. We could then go and retrieve information about the related object from another endpoint.
In other words, parents had no information about their children. So in order to link properties, it was necessary to initiate one network call, wait for several other network calls to come back, then complete our method. This was what the JSON from the base URL (/brand
) looked something like:
[{
"id":1,
"name":"AmazingCarBrand"
},
{
"id":2,
"name":"AnotherAmazingBrand"
}]
As I described, all we get is information about the brand itself. It was then necessary to find the linked pieces at their own endpoints, and use a middleman (pivot table) endpoint to figure out which children to retrieve. Here's sample JSON from a pivot table (/brand/details/id
) endpoint:
{
"brandid":1,
"makeids":[1,2,3,4]
}
We could then go to the /make/id
endpoint to get the following JSON:
{
"id":1,
"name":"Junker"
}
... then repeat the middleman/pivot table process to get information on models for each make. But for "simplicity's" sake I'll stop this tutorial one level deep.
This doesn't easily facilitate OOP
My team's goal was to do things in an Object-Oriented way, so instead of coming away from a method with a bunch of separate arrays at different times, we wanted to just have one object that had at least the top-level child objects as properties so we could use what we needed when we needed, and keep the data coupled where it needs to be... the way we typically expect things.
No big deal right, just use a loop...
Well, yeah - a loop is definitely part of what's necessary here... a couple of them actually (if we needed another property, or to go another level deep such as getting models). But when we're working in an asynchronous environment (which Networking automatically is) we don't know when we'll be getting the information we need back from the remote source. In our case, this means our Car Brand will have an incomplete list of Makes and/or Models - even though we made requests to get that information through those loops... unless we had a way of knowing when those requests were also complete.
I was using a URLSession Wrapper so it was a little cleaner, and I promise (🤞) I knew it wouldn't work... but my initial implementation looked something like this after making the initial call to get brands:
/// "https://www.example.com/base/brands"
let baseURL = URL(string: "https://www.example.com/base/brands")!
/// "https://www.example.com/base/brands/details"
let baseDetailURL = baseURL.appendingPathComponent("details")
// get Make information for one CarBrand
func getMakeInformation(for brand: CarBrand, complete: @escaping ([CarMake]?) -> ()) {
getMakeModelIds(for: brand) { ids in
// the ids that come back are optional, so we need to unwrap them
guard let ids = ids else {
print("no ids came back from makeModelId request")
complete(nil)
return
}
// initialize an empty array of CarMake
// to hold our objects when they come back
var makes: [CarMake] = []
for id in ids {
// this URL has information about which makes a brand has
let brandMakeDetailURL = baseDetailURL.appendingPathComponent("\(id)")
let brandMakeDetailRequest = URLRequest(url: brandMakeDetailURL)
URLSession.shared.dataTask(with: brandMakeDetailRequest) { (data, _, error) in
if let error = error {
print("error getting make details: \(error)")
// Normally I would continue here, but we get an error
// that we're outside of a loop. Because we're asynchronously
// programming, we're no longer on the same thread as our loop.
// We're on some background thread waiting for a response
}
if let data = data {
let decoder = JSONDecoder()
do {
let carMake = try decoder.decode(CarMake.self, from: data)
makes.append(carMake)
} catch {
print("Error decoding CarMake: \(error)")
}
}
}.resume()
}
complete(makes)
}
}
func getMakeModelIds(for brand: CarBrand, complete: @escaping ([Int]?) -> Void) {
... // implementation here
complete(ids)
}
Why didn't this work?
Because we're asynchronously programming, when we get to the part where we loop through ids and make requests, the requests are made immediately, not waiting for a response before making the next request. When we finish the loop, we may get one or two responses back, but the rest are still out in the ether.
What we need is a signal to know when all of the makes and models have come back and have been linked to the brand... what we need is a basic implementation to make sure I'm right before we get into this any deeper!
My initial idea was to keep count using a placeholder Int
property that I would increment as requests completed. There was then a conditional check that would complete the method with the entire array after every request had come back with a response.
And it worked!
...
var i = 0
for id in ids {
...
// we're in the completion here meaning
// this cal has come back, so increment
i += 1
...
if let data = data {
let decoder = JSONDecoder()
do {
let carMake = try decoder.decode(CarMake.self, from: data)
makes.append(carMake)
} catch {
print("Error decoding CarMake: \(error)")
}
if i == ids.count {
complete(makes)
}
}
While it worked, it just didn't feel... right. I'm sure this is fine, though it could probably use more error handling and the code could be more reusable. But those weren't the issues I was having. I just knew there was a built-in way to do this because I had used different methods to do things like this before; just not often enough to remember.
Did I Need A Semaphore
?
A semaphore was my initial thought, and I did some preliminary research, and it seemed to check out. A semaphore allows you to start n
tasks, and signal when they're complete. As I went through implementations though, the logic just wasn't tracking for me. So I started looking into other methods.
Was using a DispatchGroup
the answer I was seeking?
A DispatchGroup
(group
) is an object that holds a group of n
tasks. Tasks enter a group
when they begin their task and leave a group
when they finish.
DispatchGroup
then has a notify
method that checks to see if the group
is empty as a task leaves. If it is, the code inside of the notify
block gets executed. Perfect!
But doesn't a Semaphore work the same?
Yes, and no. A Semaphore
probably also would've worked here, but a Semaphore
is more for limiting the number of tasks that can run concurrently. For instance, only allowing one login from one device at a time to prevent DDOS attacks to your login server.
A DispatchGroup is for allowing n
tasks to run before completing.
While similar, a DispatchGroup
is really what's appropriate here.
Finally, let's implement our DispatchGroup
let group = DispatchGroup()
// get Make information for one CarBrand
func getMakeInformation(for brand: CarBrand, complete: @escaping ([CarMake]?) -> ()) {
getMakeModelIds(for: brand) { ids in
// the ids that come back are optional, so we need to unwrap them
guard let ids = ids else {
print("no ids came back from makeModelId request")
complete(nil)
return
}
// initialize an empty array of CarMake
// to hold our objects when they come back
var makes: [CarMake] = []
for id in ids {
// each task enters the group at the beginning of our loop
group.enter()
// this URL has information about which makes a brand has
let brandMakeDetailURL = baseDetailURL.appendingPathComponent("\(id)")
let brandMakeDetailRequest = URLRequest(url: brandMakeDetailURL)
URLSession.shared.dataTask(with: brandMakeDetailRequest) { (data, _, error) in
if let error = error {
print("error getting make details: \(error)")
// Normally I would continue here, but we get an error
// that we're outside of a loop. Because we're asynchronously
// programming, we're no longer on the same thread as our loop.
// We're on some background thread waiting for a response
}
if let data = data {
let decoder = JSONDecoder()
do {
let carMake = try decoder.decode(CarMake.self, from: data)
makes.append(carMake)
} catch {
print("Error decoding CarMake: \(error)")
}
}
// each task leaves the group when it's finished
// you want to make sure if you're exiting the method
// anywhere else to also exit the group
group.leave()
}.resume()
}
// This will complete on the main thread
// when all tasks complete.
// If you want to complete on a background
// thread, you can initialize a new background thread
// here or use a `DispatchQueue` and pass it in
group.notify(queue: .main) {
complete(makes)
}
}
}
Wrap-Up
What we've done here is linked multiple dependencies to one object using multiple asynchronous tasks, and we're only completing the top level object when all asynchronous tasks related to it are complete.
We did this using a DispatchGroup
which allows us to begin multiple asynchronous tasks, and notifies us when they've all finished.
If we wanted to limit the number of calls we could make at once, we could use a Semaphore
.
Where to go from here
There are certainly a lot of improvements that can be made to this code.
-
For one, it could be more modular.
- As I mentioned earlier, I normally use a
URLSession
Wrapper
- As I mentioned earlier, I normally use a
-
Second, error handling could be improved. Look out for an article from me that links that
URLSession
Wrapper to an Error Handler using theResult
Type- In addition, some of this is assuming "The Happy Path". I believe there are one or 2 places where I'm force-unwrapping Optionals for instance.
Third, we didn't touch on how JSON Decoding is working using the Codable protocol
-
Lastly, if you tried to run any of this code, you got a bunch of decoding errors - this is because
www.example.com
isn't hosting JSON for us!- Please go forth and find an API that's structured in a similar way, or string several together... heck - you could roll your own using Firebase or something similar. Play around with DispatchGroups and Queues and see what you can come up with. It's probably the best way to learn.
Or you can watch me babble more words ⏬
Posted on October 23, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
October 23, 2020