Resumable multi-chunk upload to GCP Bucket
eao
Posted on January 11, 2021
Introduction
Collecting uploaded files in a bucket makes sense for many web-applications. Directing the upload straight to the bucket and cutting out the server as an unnecessary point in the middle even more so.
I am going to give you a quick overview of how you can use resumable upload sessions in GCP to achieve a secure upload from the browser straight into a bucket without having to deal with authentication for the bucket in terms of each individual user.
Skipping authentication
If you are planning to let users upload files, your backend most-likely already has some type of authentication implemented to let users log in and coordinate which data and functionality they are authorized for. Propagating this authorization to your buckets in Google File Storage would be tedious. Instead we will use a service account, which is authorized on the buckets to generate the URI of a resumable upload session.
This URI acts as a signed url, that gives time-limited access to a requested ressource. Google describes a resumable upload session as follows:
A resumable upload allows you to resume data transfer operations to Cloud Storage after a communication failure has interrupted the flow of data. Resumable uploads work by sending multiple requests, each of which contains a portion of the object you're uploading. This is different from a simple upload, which contains all of the object's data in a single request and must restart from the beginning if it fails part way through.
In order to generate this URI, an authenticated API call has to be made, that specifies the length of the content that is to be sent, and the bucket the file should be saved in.
curl -i -X POST --data-binary @METADATA_LOCATION \
-H "Authorization: Bearer OAUTH2_TOKEN" \
-H "Content-Type: application/json" \
-H "Content-Length: INITIAL_REQUEST_LENGTH" \
"https://storage.googleapis.com/upload/storage/v1/b/BUCKET_NAME/o?uploadType=resumable&name=OBJECT_NAME"
If authenticated users in the front-end were authorized directly for the respective buckets, this call could be made there. As specified earlier, we only want to authorize a service account for our bucket. Therefore we need to add a new endpoint to our own API. The controller for this endpoint is authenticated as the service account and retrieves and returns the resumable session URI.
While the API call could be made directly using any http module, using a google cloud client library, which offers wrappers for these functions directly can come in handy. As our backend was implemented in python, we decided to use the google.cloud.storage
library, documented here.
from google.cloud import storage
from google.cloud.exceptions import NotFound
from google.cloud.storage.notification import BucketNotification
from google.oauth2 import service_account
Initializing the storage client and authenticating it with the service account is rather trivial.
def __init__(self):
logging.info("Initializing Storage client...")
credentials = service_account.Credentials.from_service_account_file(
CREDENTIALS)
self.storage_client = storage.Client(credentials=credentials)
logging.info("Successfully initialized Storage client!")
Now we only need to call create_resumable_upload_session()
on the bucket we want the file to be uploaded to. And serve this uri to authorized users who request it.
def initiate_upload(self, bucket_id: str, file_name: str):
bucket = self.storage_client.get_bucket(bucket_id)
blob = bucket.blob(f'{INPUT_PATH}{file_name}')
uri = blob.create_resumable_upload_session(
origin="http://example.com"
)
return uri
Adding the origin
is very important, as it will tell gcp to append the correct allow-origin headers to pre-flight requests from the browser on the resource. Without you will most definitely run into CORS issues.
Additional CORS settings at a bucket level can also be made using the client library. Make sure to read up on the headers and their implications, before making changes to your buckets though. Configuring CORS in Buckets
bucket.cors = [
{
"origin": ["*"],
"responseHeader": [
"Content-Type",
"Access-Control-Allow-Origin",
"x-goog-resumable"],
"method": ["GET", "HEAD", "DELETE", "POST", "OPTIONS"],
"maxAgeSeconds": 3600
}]
bucket.patch()
Uploading the file
Lots of setting up and no file upload in sight. Let's change that.
We implemented our front-end in Angular v11 using the standard httpModule
and rxjs
for the Observables.
Let's outline the steps required for chunking and uploading the file:
- select file
- request resumable upload uri (give filename and size)
- upload chunk (chunk size must be multiple of 256 KiB)
- if response is
200
the upload is complete. If it is308
the chunk was successfully uploaded, but the upload is incomplete. The headerrange
contains the last uploaded byte. Go back to step 3.
We created an interface that contains all information relevant for the upload of one file and allows us to limit the calls to the HTML5 fileApi that we need to make.
export interface chunkUploadingSession {
file: File; // the File to upload
fileSize: number; // saved, because file.size can be expensive
chunkSize: number; // the size of the chunks for us set to 8388608 (8MiB) as best-practice suggests
uploadProgress: number; // bytes transmitted (used for progress bar)
uploadStarted: boolean; // indication whether the upload has started
uploadPaused: boolean; // indication whether the upload was paused
uploadComplete?: boolean; // indication whether the upload is complete
uploadUri?: string; // the infamous resumable upload uri
successfullyUploaded: number; // bytes successfully transmitted (as confirmed in response from gcp)
currentRequest?: Subscription; // subscription to the current chunk upload, to allow cancelling mid transmission
}
We initialize this session, whenever a file is added in our upload.component.ts
. In our case only one file had to be uploaded at a time; multiple files would however work analogously.
uploadSession: chunkUploadingSession;
handleFileInput(files: FileList) {
this.uploadSession = {
file: files.item(0),
fileSize: files.item(0).slice().size,
chunkSize: 8388608,
successfullyUploaded: 0,
uploadProgress: 0,
uploadStarted: false,
uploadPaused: false,
uploadUri: undefined,
};
}
In order to implement functionality for resuming an upload, we will need to be able to specify, at which byte in the the upload should start. We make use of toPromise()
in order to be able to await the uri, if it doesn't exist yet and only then commence the upload.
/**
* Commences/resumes the upload of the current file.
* @param firstChunkIndex byte index, at which the upload should start/continue
*/
async upload(firstChunkIndex: number = 0) {
// Tell the frontend, that the upload has started. E.g. to disable upload button.
this.uploadSession.uploadStarted = true;
// Check whether a resumable upload uri has already been generated
if (!this.uploadSession.uploadUri) {
await this.http
.get(`${BASE_URL}/api/resumableupload`,{name: this.uploadSession.file.name, size: this.uploadSession.fileSize})
.toPromise().then((uri) => {
this.uploadSession.uploadUri = uri.sessionUri;
}).;
}
// Start the upload (needs to be implemented)
this.uploadService.uploadChunk(this.uploadSession, firstChunkIndex);
}
Cool, but we still haven't uploaded the file, have we?
Nope. Let's dive straight into the upload.service.ts
. In order to determine the range of bytes that should be uploaded, a helper method getChunkEnd()
might come in handy.
/**
* Determines whether the file ends within the next chunk and returns
* either the end of the file or end of chunk based on the starting byte.
* @param start starting byte of chunk
* @param session uploadSession
*/
getChunkEnd(start, session: chunkUploadingSession): number {
if (start + session.chunkSize > session.fileSize) {
return session.fileSize;
} else {
return start + session.chunkSize;
}
}
With this out of the way we can finally get to the part you have all been waiting for. The chunk upload.
/**
* Uploads a chunk based on the starting byte and calls itself,
* if the file upload is incomplete.
* @param session current session
* @param start starting byte
*/
uploadChunk(session: chunkUploadingSession, start: number) {
// calculate the end of the byte range
let end = this.getChunkEnd(start, session);
// print the range to the console
console.debug(
`Uploading file [${session.file.name}]. Starting byte ${start} to ${
end - 1
} of ${session.fileSize} to ${session.uploadUri}`
);
// call http put on the session uri
// append the blob of the file chunk as the body
session.currentRequest = this.http
.put(session.uploadUri, session.file.slice(start, end), {
// let the observable respond with all events, so that it can report on the upload progress
observe: 'events',
reportProgress: true,
// set the content range header to let gcp know which part of the file is sent
headers: {
'Content-Range': `bytes ${start}-${end - 1}/${session.fileSize}`,
},
})
.subscribe(
// because we are observing 'events' the response is an HttpEvent
(res: HttpEvent<any>) => {
// If the response is an HttpEvent and the status code is 200 the file upload has complete in its entirety.
if (res.type === HttpEventType.Response && res.status == 200) {
// wow you actually did it. If you want to trigger a confetti rain method, here is the spot.
this.message('Upload complete!', '');
}
// If the type is upload progress, we can use it for showing a pretty progress bar.
else if (res.type === HttpEventType.UploadProgress) {
session.uploadProgress = start + res.loaded;
}
},
// GCP responds with 308, if a chunk was uploaded, but the file is incomplete.
// For the angular http module any non 2xx code is an error. Therefore we need to use the error callback to continue.
async (res: HttpResponse<Object>) => {
if (res.status == 308) {
// the range header contains the confirmation by google which bytes have actually been written to the bucket
const range = res.headers.get('range');
end = +range.substring(range.indexOf('-') + 1, range.length);
session.successfullyUploaded = end;
//Check, whether the upload is paused, otherwise make a recursive call to upload the next chunk.
if (!session.uploadPaused) {
this.uploadChunk(session, end);
}
} else {
// if the code is not 308 you need to handle the error and inform the users.
}
}
);
}
With this recursive call most of the work for uploading files in chunks is already done!
Now we only need to wrap the service's functions in our upload.component.ts
For initializing the upload we can simply bind upload()
directly to an element.
<div (click)="upload()">Start Upload</div>
For pausing the upload we simply set uploadPaused
to true
. This means however, that the chunk that is currently uploading will still upload. If you would rather pause immediately and restart the current chunk after unpausing, unsubscribe from the observable in the session.
pauseUpload() {
this.uploadSession.uploadPaused = true;
}
Resuming is pretty much a combination of unpausing and starting the upload at the last position.
resumeUpload() {
this.uploadSession.uploadPaused = false;
this.upload(this.uploadSession.successfullyUploaded);
}
For cancelling we will need to pause the upload, unsubscribe from the observable, reset the session and delete the session uri, so it can't be used anymore.
In the upload.service.ts
we therefore create a new method:
/**
* Delete the current session to cancel it.
* @param session
*/
deleteSession(session: chunkUploadingSession) {
this.http.delete(session.uploadUri).subscribe(
// Instead of a 200 gcp returns a 499, if the session/uri was successfully deleted
// as http in Angular interprets every non-2xx code as an error,
// the success callback will never occur
(res) => this.message('This will never happen.', ''),
(err: HttpResponse<Object>) => {
//
if (err.status == 499) {
// cancel the upload, if there is one currenlty running
session.currentRequest.unsubscribe();
// inform the user, that the cancellation was successful
} else {
// inform the user, that an error occured
}
}
);
}
With this implemented, we can just call it from the upload.component.ts
and are nearly done!
cancelUpload() {
this.pauseUpload();
this.uploadService.deleteSession(this.uploadSession);
}
Showing progress.
With the upload functionality fully functional, we can now focus on the user experience. Letting the user know how far his download has proceeded is a great way to show them that something is actually happening.
Implementing a status text or progress-bar is really simple, as we already have all the information we need stored in the session.
For a status text e.g.:
{{uploadSession.uploadProgress}}/{{uploadSession.fileSize}}
will print how many bytes of the total have aleady been uploaded. I suggest considering the use of a pipe, to convert the bytes to a more human readable format. See
From this: (source)
Unminified and ES6'ed: (by the community)
function formatBytes(bytes, decimals = 2) {
if (!+bytes) return '0 Bytes'
const k = 1024
const dm = decimals < 0 ? 0 : decimals
const sizes = ['Bytes', 'KiB', 'MiB', 'GiB', 'TiB', 'PiB', 'EiB', 'ZiB', 'YiB']
…</p>
For a progress bar or spinner just calculate the percentage (100 * uploadSession.uploadProgress) / uploadSession.fileSize
or leave the 100
, if it requires a value between 0 and 1. In Angular Material e.g.:
<mat-progress-bar mode="determinate"
[value]="(100 * uploadSession.uploadProgress) /uploadSession.fileSize"
>
</mat-progress-bar>
Summary
I hope I was able to show you how you can use resumable session uris and the HTML5 FileApi to let users upload files directly from their browser to a google cloud bucket in an efficient and secure manner.
While implementing this I learned a lot about CORS, the HTML5 FileApi, the Angular HTTP Module and RxJs observables. And I am still wondering why google cloud storage would return an http status code of 499
or 308
if my request was processed exactly like planned. If it would at least have been418
we could have sat down for a cup of tea.
I wish you all a happy new year and hope you found this contribution helpful!
Posted on January 11, 2021
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.