Upload a huge file with little RAM & space in Go

tobychui

Toby Chui

Posted on June 22, 2022

Upload a huge file with little RAM & space in Go

This is an updated version of the previous implementation regarding uploading a file larger than RAM size in Go. If you didn't read it before, you can check it out in the following link.

In last blog post, I tackled the the Go upload method using websocket file chunking implementation to handle cases where the uploaded file is much larger than available RAM on the device. This implementation is really helpful when you are developing applications for some cheap SBCs with as little RAM as 512MB.

Recently, I encountered another issue when I am trying to migrate my whole Google Drive to my own ARM powered DIY NAS. The issue was that my NAS only have 512MB + 32GB (microSD card) as OS drive, while I have 2 x 512GB HDD attached to the SBC to store files. Uploading a file with size >32GB will causing the system to run out of space and crashing my ArozOS NAS OS .

In the previous implementation, in order to upload a 1GB file, you will need 1GB space in your tmp folder (i.e. the SD card) in order for it to buffer the file chunks received via websocket. In the latest implementation, a new "huge file mode" was added to handle cases where uploading file > tmp folder space by directly writing the upload file chunks to target disk while minimizing the maximum required space on all the system disk. Before I show you the code on how it works, this is the logic for me to decide when to enter "huge file mode"

Logic for Optimizing Both Upload Space & Time Occupancy

  1. If a file is smaller than 4MB, upload with FORM POST (to reduce overhead, fastest)
  2. Else if the file is smaller than "The remaining space on tmp" / 16 - 1KB, the file is buffered into the tmp folder (tmp folder should be in fast medium like NVME SSDs or RAM Disk, slower than FORM POST but still fast)
  3. Otherwise, the file chunks are directly buffered to disk (slowest, but provide us the most space to work with)

File Merging Procedures

In the previous implementation, the file merging procedures happens like this

  1. Create the destination file and open it
  2. Iterate through each chunks, append it to the opened destination file
  3. Delete all the chunk files

*However, this would take 2x the space of the file being upload. * It works fine for medium sized files, but not good for huge files. To solve this, I have changed the implementation to the followings.

  1. Create the definition file and open it
  2. Iterate through each chunks, append each chunks to the opened destination file, confirm the copy is success and remove the source chunks

In simple words, by deleting file on the fly, the new upload logic only takes up (x + c) bytes size, where x is the file size and c is the chunk size. In my design, c is 512KB.

The Code

There is no change to the front end code except there is an extra GET parameter when opening the websocket to define if the current upload is huge file upload. The following is an example implementation for the websocket object

 let hugeFileMode = "";
if (file.size > largeFileCutoffSize){
       //Filesize over cutoff line. Use huge file mode
       hugeFileMode = "&hugefile=true";
}

let socket = new WebSocket(protocol + window.location.hostname + ":" + port + "/system/file_system/lowmemUpload?filename=" + encodeURIComponent(filename) + "&path=" + encodeURIComponent(uploadDir) + hugeFileMode);
Enter fullscreen mode Exit fullscreen mode

And here is the Go backend side implementation. Note the
isHugeFile flag and //Merge the file section.

targetUploadLocation := filepath.Join(uploadPath, filename)
if !fs.FileExists(uploadPath) {
    os.MkdirAll(uploadPath, 0755)
}

//Generate an UUID for this upload
uploadUUID := uuid.NewV4().String()
uploadFolder := filepath.Join(*tmp_directory, "uploads", uploadUUID)
if isHugeFile {
    //Upload to the same directory as the target location.
    uploadFolder = filepath.Join(uploadPath, ".metadata/.upload", uploadUUID)
}
os.MkdirAll(uploadFolder, 0700)

//Start websocket connection
var upgrader = websocket.Upgrader{}
upgrader.CheckOrigin = func(r *http.Request) bool { return true }
c, err := upgrader.Upgrade(w, r, nil)
if err != nil {
    log.Println("Failed to upgrade websocket connection: ", err.Error())
    w.WriteHeader(http.StatusInternalServerError)
    w.Write([]byte("500 WebSocket upgrade failed"))
    return
}
defer c.Close()

//Handle WebSocket upload
blockCounter := 0
chunkName := []string{}
lastChunkArrivalTime := time.Now().Unix()

//Setup a timeout listener, check if connection still active every 1 minute
ticker := time.NewTicker(60 * time.Second)
done := make(chan bool)
go func() {
    for {
        select {
        case <-done:
            return
        case <-ticker.C:
            if time.Now().Unix()-lastChunkArrivalTime > 300 {
                //Already 5 minutes without new data arraival. Stop connection
                log.Println("Upload WebSocket connection timeout. Disconnecting.")
                c.WriteControl(8, []byte{}, time.Now().Add(time.Second))
                time.Sleep(1 * time.Second)
                c.Close()
                return
            }
        }
    }
}()

totalFileSize := int64(0)
for {
    mt, message, err := c.ReadMessage()
    if err != nil {
        //Connection closed by client. Clear the tmp folder and exit
        log.Println("Upload terminated by client. Cleaning tmp folder.")
        //Clear the tmp folder
        time.Sleep(1 * time.Second)
        os.RemoveAll(uploadFolder)
        return
    }
    //The mt should be 2 = binary for file upload and 1 for control syntax
    if mt == 1 {
        msg := strings.TrimSpace(string(message))
        if msg == "done" {
            //Start the merging process
            break
        } else {
            //Unknown operations

        }
    } else if mt == 2 {
        //File block. Save it to tmp folder
        chunkFilepath := filepath.Join(uploadFolder, "upld_"+strconv.Itoa(blockCounter))
        chunkName = append(chunkName, chunkFilepath)
        writeErr := ioutil.WriteFile(chunkFilepath, message, 0700)

        if writeErr != nil {
            //Unable to write block. Is the tmp folder fulled?
            log.Println("[Upload] Upload chunk write failed: " + err.Error())
            c.WriteMessage(1, []byte(`{\"error\":\"Write file chunk to disk failed\"}`))

            //Close the connection
            c.WriteControl(8, []byte{}, time.Now().Add(time.Second))
            time.Sleep(1 * time.Second)
            c.Close()

            //Clear the tmp files
            os.RemoveAll(uploadFolder)
            return
        }

        //Update the last upload chunk time
        lastChunkArrivalTime = time.Now().Unix()

        //Check if the file size is too big
        totalFileSize += fs.GetFileSize(chunkFilepath)
        if totalFileSize > max_upload_size {
            //File too big
            c.WriteMessage(1, []byte(`{\"error\":\"File size too large\"}`))

            //Close the connection
            c.WriteControl(8, []byte{}, time.Now().Add(time.Second))
            time.Sleep(1 * time.Second)
            c.Close()

            //Clear the tmp files
            os.RemoveAll(uploadFolder)
            return
        }
        blockCounter++

        //Request client to send the next chunk
        c.WriteMessage(1, []byte("next"))

    }
}

//Try to decode the location if possible
decodedUploadLocation, err := url.QueryUnescape(targetUploadLocation)
if err != nil {
    decodedUploadLocation = targetUploadLocation
}

//Do not allow % sign in filename. Replace all with underscore
decodedUploadLocation = strings.ReplaceAll(decodedUploadLocation, "%", "_")

//Merge the file
out, err := os.OpenFile(decodedUploadLocation, os.O_CREATE|os.O_WRONLY, 0755)
if err != nil {
    log.Println("Failed to open file:", err)
    c.WriteMessage(1, []byte(`{\"error\":\"Failed to open destination file\"}`))
    c.WriteControl(8, []byte{}, time.Now().Add(time.Second))
    time.Sleep(1 * time.Second)
    c.Close()
    return
}

for _, filesrc := range chunkName {
    srcChunkReader, err := os.Open(filesrc)
    if err != nil {
        log.Println("Failed to open Source Chunk", filesrc, " with error ", err.Error())
        c.WriteMessage(1, []byte(`{\"error\":\"Failed to open Source Chunk\"}`))
        return
    }
    io.Copy(out, srcChunkReader)
    srcChunkReader.Close()

    //Delete file immediately to save space
    os.Remove(filesrc)
}

out.Close()

//Return complete signal
c.WriteMessage(1, []byte("OK"))

//Stop the timeout listner
done <- true

//Clear the tmp folder
time.Sleep(300 * time.Millisecond)
err = os.RemoveAll(uploadFolder)
if err != nil {
    log.Println(err)
}

//Close WebSocket connection after finished
c.WriteControl(8, []byte{}, time.Now().Add(time.Second))
time.Sleep(300 * time.Second)
c.Close()


Enter fullscreen mode Exit fullscreen mode

And now you can have infinite file upload size

So there you have it. Now you can upload infinitely large file into your system as soon as you have enough disk space to store it. Notes that this upload method is very slow. It takes more than 2 time the speed than the previous method to actually merge the file as both of the reading file chunks and writing destination file are on the same disk. But for my use case, at least it works well enough for files that is too large to fit into the system RAM or the tmp/ folder.

I have no idea who will ever find this useful other than myself working on the ArozOS project. People with these issues nowadays usually just dump the file to AWS or whatever cloud service provider providing large file storage. But if you find it useful or you got even better implementation, feel free to let me know so we can further improve the design :)

GitHub logo tobychui / arozos

Web Desktop Operating System for low power platforms, Now written in Go!

Image

Features

User Interface

  • Web Desktop Interface
  • Ubuntu remix Windows style startup menu and task bars
  • Clean and easy to use File Manager (Support drag drop, upload etc)
  • Simplistic System Setting Menu
  • No-bull-shit module naming scheme

Networking

  • Basic Realtime Network Statistic
  • Static Web Server (with build in Web Editor!)
  • mDNS discovery + SSDP broadcast
  • UPnP Port Forwarding
  • WiFi Management (Support wpa_supplicant for Rpi or nmcli for Armbian)

File / Disk Management

  • Mount Disk Utilities

    • Local File Systems (ext4, NTFS, FAT etc)
    • Remote File Systems (WebDAV, SMB, SFTP etc)
  • Build in Network File Sharing Servers

    • FTP, WebDAV, SFTP
    • Basic Auth based simple HTTP interface for legacy devices with outdated browser
  • Virtual File System + Sandbox Architecture

  • File Sharing (Similar to Google Drive)

  • Basic File Operations with Real-time Progress (Copy / Cut / Paste / New File or Folder etc)

Security

  • oAuth
  • LDAP
  • IP White / Blacklist
  • Exponential login timeout

Extensibility

  • ECMA5 (JavaScript…




💖 💪 🙅 🚩
tobychui
Toby Chui

Posted on June 22, 2022

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related