Upload a huge file with little RAM & space in Go

This is an updated version of the previous implementation regarding uploading a file larger than RAM size in Go. If you didn't read it before, you can check it out in the following link.

Upload a file larger than RAM size in Go

Toby Chui ・ Jan 15 '21

#go #raspberrypi #upload #filesystem

In last blog post, I tackled the the Go upload method using websocket file chunking implementation to handle cases where the uploaded file is much larger than available RAM on the device. This implementation is really helpful when you are developing applications for some cheap SBCs with as little RAM as 512MB.

Recently, I encountered another issue when I am trying to migrate my whole Google Drive to my own ARM powered DIY NAS. The issue was that my NAS only have 512MB + 32GB (microSD card) as OS drive, while I have 2 x 512GB HDD attached to the SBC to store files. Uploading a file with size >32GB will causing the system to run out of space and crashing my ArozOS NAS OS .

In the previous implementation, in order to upload a 1GB file, you will need 1GB space in your tmp folder (i.e. the SD card) in order for it to buffer the file chunks received via websocket. In the latest implementation, a new "huge file mode" was added to handle cases where uploading file > tmp folder space by directly writing the upload file chunks to target disk while minimizing the maximum required space on all the system disk. Before I show you the code on how it works, this is the logic for me to decide when to enter "huge file mode"

Logic for Optimizing Both Upload Space & Time Occupancy

If a file is smaller than 4MB, upload with FORM POST (to reduce overhead, fastest)
Else if the file is smaller than "The remaining space on tmp" / 16 - 1KB, the file is buffered into the tmp folder (tmp folder should be in fast medium like NVME SSDs or RAM Disk, slower than FORM POST but still fast)
Otherwise, the file chunks are directly buffered to disk (slowest, but provide us the most space to work with)

File Merging Procedures

In the previous implementation, the file merging procedures happens like this

Create the destination file and open it
Iterate through each chunks, append it to the opened destination file
Delete all the chunk files

*However, this would take 2x the space of the file being upload. * It works fine for medium sized files, but not good for huge files. To solve this, I have changed the implementation to the followings.

Create the definition file and open it
Iterate through each chunks, append each chunks to the opened destination file, confirm the copy is success and remove the source chunks

In simple words, by deleting file on the fly, the new upload logic only takes up (x + c) bytes size, where x is the file size and c is the chunk size. In my design, c is 512KB.

The Code

There is no change to the front end code except there is an extra GET parameter when opening the websocket to define if the current upload is huge file upload. The following is an example implementation for the websocket object

 let hugeFileMode = "";
if (file.size > largeFileCutoffSize){
       //Filesize over cutoff line. Use huge file mode
       hugeFileMode = "&hugefile=true";
}

let socket = new WebSocket(protocol + window.location.hostname + ":" + port + "/system/file_system/lowmemUpload?filename=" + encodeURIComponent(filename) + "&path=" + encodeURIComponent(uploadDir) + hugeFileMode);

And here is the Go backend side implementation. Note the
isHugeFile flag and //Merge the file section.

targetUploadLocation := filepath.Join(uploadPath, filename)
if !fs.FileExists(uploadPath) {
    os.MkdirAll(uploadPath, 0755)
}

//Generate an UUID for this upload
uploadUUID := uuid.NewV4().String()
uploadFolder := filepath.Join(*tmp_directory, "uploads", uploadUUID)
if isHugeFile {
    //Upload to the same directory as the target location.
    uploadFolder = filepath.Join(uploadPath, ".metadata/.upload", uploadUUID)
}
os.MkdirAll(uploadFolder, 0700)

//Start websocket connection
var upgrader = websocket.Upgrader{}
upgrader.CheckOrigin = func(r *http.Request) bool { return true }
c, err := upgrader.Upgrade(w, r, nil)
if err != nil {
    log.Println("Failed to upgrade websocket connection: ", err.Error())
    w.WriteHeader(http.StatusInternalServerError)
    w.Write([]byte("500 WebSocket upgrade failed"))
    return
}
defer c.Close()

//Handle WebSocket upload
blockCounter := 0
chunkName := []string{}
lastChunkArrivalTime := time.Now().Unix()

//Setup a timeout listener, check if connection still active every 1 minute
ticker := time.NewTicker(60 * time.Second)
done := make(chan bool)
go func() {
    for {
        select {
        case <-done:
            return
        case <-ticker.C:
            if time.Now().Unix()-lastChunkArrivalTime > 300 {
                //Already 5 minutes without new data arraival. Stop connection
                log.Println("Upload WebSocket connection timeout. Disconnecting.")
                c.WriteControl(8, []byte{}, time.Now().Add(time.Second))
                time.Sleep(1 * time.Second)
                c.Close()
                return
            }
        }
    }
}()

totalFileSize := int64(0)
for {
    mt, message, err := c.ReadMessage()
    if err != nil {
        //Connection closed by client. Clear the tmp folder and exit
        log.Println("Upload terminated by client. Cleaning tmp folder.")
        //Clear the tmp folder
        time.Sleep(1 * time.Second)
        os.RemoveAll(uploadFolder)
        return
    }
    //The mt should be 2 = binary for file upload and 1 for control syntax
    if mt == 1 {
        msg := strings.TrimSpace(string(message))
        if msg == "done" {
            //Start the merging process
            break
        } else {
            //Unknown operations

        }
    } else if mt == 2 {
        //File block. Save it to tmp folder
        chunkFilepath := filepath.Join(uploadFolder, "upld_"+strconv.Itoa(blockCounter))
        chunkName = append(chunkName, chunkFilepath)
        writeErr := ioutil.WriteFile(chunkFilepath, message, 0700)

        if writeErr != nil {
            //Unable to write block. Is the tmp folder fulled?
            log.Println("[Upload] Upload chunk write failed: " + err.Error())
            c.WriteMessage(1, []byte(`{\"error\":\"Write file chunk to disk failed\"}`))

            //Close the connection
            c.WriteControl(8, []byte{}, time.Now().Add(time.Second))
            time.Sleep(1 * time.Second)
            c.Close()

            //Clear the tmp files
            os.RemoveAll(uploadFolder)
            return
        }

        //Update the last upload chunk time
        lastChunkArrivalTime = time.Now().Unix()

        //Check if the file size is too big
        totalFileSize += fs.GetFileSize(chunkFilepath)
        if totalFileSize > max_upload_size {
            //File too big
            c.WriteMessage(1, []byte(`{\"error\":\"File size too large\"}`))

            //Close the connection
            c.WriteControl(8, []byte{}, time.Now().Add(time.Second))
            time.Sleep(1 * time.Second)
            c.Close()

            //Clear the tmp files
            os.RemoveAll(uploadFolder)
            return
        }
        blockCounter++

        //Request client to send the next chunk
        c.WriteMessage(1, []byte("next"))

    }
}

//Try to decode the location if possible
decodedUploadLocation, err := url.QueryUnescape(targetUploadLocation)
if err != nil {
    decodedUploadLocation = targetUploadLocation
}

//Do not allow % sign in filename. Replace all with underscore
decodedUploadLocation = strings.ReplaceAll(decodedUploadLocation, "%", "_")

//Merge the file
out, err := os.OpenFile(decodedUploadLocation, os.O_CREATE|os.O_WRONLY, 0755)
if err != nil {
    log.Println("Failed to open file:", err)
    c.WriteMessage(1, []byte(`{\"error\":\"Failed to open destination file\"}`))
    c.WriteControl(8, []byte{}, time.Now().Add(time.Second))
    time.Sleep(1 * time.Second)
    c.Close()
    return
}

for _, filesrc := range chunkName {
    srcChunkReader, err := os.Open(filesrc)
    if err != nil {
        log.Println("Failed to open Source Chunk", filesrc, " with error ", err.Error())
        c.WriteMessage(1, []byte(`{\"error\":\"Failed to open Source Chunk\"}`))
        return
    }
    io.Copy(out, srcChunkReader)
    srcChunkReader.Close()

    //Delete file immediately to save space
    os.Remove(filesrc)
}

out.Close()

//Return complete signal
c.WriteMessage(1, []byte("OK"))

//Stop the timeout listner
done <- true

//Clear the tmp folder
time.Sleep(300 * time.Millisecond)
err = os.RemoveAll(uploadFolder)
if err != nil {
    log.Println(err)
}

//Close WebSocket connection after finished
c.WriteControl(8, []byte{}, time.Now().Add(time.Second))
time.Sleep(300 * time.Second)
c.Close()

And now you can have infinite file upload size

So there you have it. Now you can upload infinitely large file into your system as soon as you have enough disk space to store it. Notes that this upload method is very slow. It takes more than 2 time the speed than the previous method to actually merge the file as both of the reading file chunks and writing destination file are on the same disk. But for my use case, at least it works well enough for files that is too large to fit into the system RAM or the tmp/ folder.

I have no idea who will ever find this useful other than myself working on the ArozOS project. People with these issues nowadays usually just dump the file to AWS or whatever cloud service provider providing large file storage. But if you find it useful or you got even better implementation, feel free to let me know so we can further improve the design :)

tobychui / arozos

Web Desktop Operating System for low power platforms, Now written in Go!

Features

User Interface

Web Desktop Interface
Ubuntu remix Windows style startup menu and task bars
Clean and easy to use File Manager (Support drag drop, upload etc)
Simplistic System Setting Menu
No-bull-shit module naming scheme

Networking

Basic Realtime Network Statistic
Static Web Server (with build in Web Editor!)
mDNS discovery + SSDP broadcast
UPnP Port Forwarding
WiFi Management (Support wpa_supplicant for Rpi or nmcli for Armbian)

File / Disk Management

Mount Disk Utilities
- Local File Systems (ext4, NTFS, FAT etc)
- Remote File Systems (WebDAV, SMB, SFTP etc)
Build in Network File Sharing Servers
- FTP, WebDAV, SFTP
- Basic Auth based simple HTTP interface for legacy devices with outdated browser
Virtual File System + Sandbox Architecture
File Sharing (Similar to Google Drive)
Basic File Operations with Real-time Progress (Copy / Cut / Paste / New File or Folder etc)

Security

oAuth
LDAP
IP White / Blacklist
Exponential login timeout

Extensibility

ECMA5 (JavaScript…

View on GitHub

Blog