Python: open().read()->str , but for big files

taikedz

Tai Kedzierski

Posted on July 28, 2022

Python: open().read()->str , but for big files

Image (C) Tai Kedzierski

We just had a use case where we needed to POST a file over to a server. The naive implementation for posting with requests is to do

with open("my_file.bin", 'rb') as fh:
    requests.post(url, data={"bytes": fh.read()})
Enter fullscreen mode Exit fullscreen mode

Job done! Well. If the file is reeeally big, that .read() operation will attempt to load the entire file into memory, before passing the loaded bytes to requests.post(...)

Clearly, this is going to hurt. A lot.

Use mmap

A quick search yielded a solution using mmap to create a "memory mapped" object, which would behave like a string, whilst being backed by a file that only gets read in chunks as needed.

As ever, I like making things re-usable, and easy to slot-in. I adapted the example into a contextual object that can be used in-place of a normal call to open()

# It's a tiny snippet, but go on.
# Delviered to You under MIT Expat License, aka "Do what you want"
# I'm not even fussy about attribution.

import mmap

class StringyFileReader:

    def __init__(self, file_name, mode):
        if mode not in ("r", "rb"):
            raise ValueError(f"Invalid mode '{mode}'. Only read-modes are supported")

        self._fh = open(file_name, mode)
        # A file size of 0 means "whatever the size of the file actually is" on non-Windows
        # On Windows, you'll need to obtain the actual size, though
        fsize = 0
        self._mmap = mmap.mmap(self._fh.fileno(), fsize, access=mmap.ACCESS_READ)


    def __enter__(self):
        return self


    def read(self):
        return self._mmap


    def __exit__(self, *args):
        self._mmap.close()
        self._fh.close()
Enter fullscreen mode Exit fullscreen mode

Which then lets us simply tweak the original naive example to:

with StringyFileReader("my_file.bin", 'rb') as fh:
    requests.post(url, data={"bytes": fh.read()})
Enter fullscreen mode Exit fullscreen mode

Job. Done.

EDIT: we've discovered through further use that requests is pretty stupid. It sill tries to read the entire file into memory - possibly by doing a copy of the "string" it receives during one of its internal operations. So this solution seems to only stand in limited cases...

💖 💪 🙅 🚩
taikedz
Tai Kedzierski

Posted on July 28, 2022

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related