Israël Hallé
Posted on July 21, 2022
I've been working with Python typing annotation in the last few years as part of our main product at Flare Systems. I've found it to be a wonderful tool to support refactoring and make the code more readable. Lately, I explored how we can make API safer with the uses of types. I will specifically look about how we can use Python typing annotation to make os.system
foolproof.
As a starting point, the current type of system
is:
def system(command: StrOrBytesPath) -> int: ...
This typing is correct. It has the benefits of catching any call not using a string that would be bound to fail. But this doesn't check for any misuse such as passing unsanitized user input. For example, someone might want to use ImageMagick to resize an image:
def resize(size: str):
system(f"convert INPUT -resize {size} OUTPUT")
def api(request):
resize(request.args["size"])
Unfortunately, this simple implementation introduced a critical flaw in our application. A user could use a malicious size such as $(echo hacked)
. The size would then insert itself in the command and execute the following command: convert INPUT -resize $(echo hacked) OUTPUT
. This exact vulnerability pattern is still very common to this day.
The fix is as simple as the mistake. shlex.quote
can be used to ensure a string is used as a single string token in a command. Yet, there's no explicit check in system
to ensure that the command has escaped user inputs.
Fortunately, we can think of ways to improve this. First of all, we can split all types into two categories: Safe and unsafe. As seen, using user input as system
argument is unsafe. But, passing a literal string should be somewhat safer. At least, literals lead to predictable behavior. If you execute the rm --no-preserve-root -rf /
literal you can predict that it will wipe your disk.
Typing annotation users might already be familiar with the Literal
type. As a quick reminder, the literal type allows developers to type a variable with a literal value. This is useful for functions that might take only a finite amount of known literal value. For example, system
could have used it this way:
def system(command: Literal["ls"] | Literal["id"]): ...
system("ls") # ok
system("id") # ok
system("rm -rf") # error!
system(request.args["size"]) # error!
Note that in practice I usually favor Enum
. This usually lead to safer code since it also check the value are correct at runtime.
One nice thing with this concept is that the type checker will not allow passing a str
when expecting a Literal
. The big limitation is that Literal
only work on concrete literal. There's no way to set the type of a variable to take any literal of a type:
def system(command: Literal) -> int: ... # error: Literal[...] must have at least one parameter
In our case this is quite restrictive since we want system
to run any safe command. Until Python 3.10, there were no built-in ways to have a function that only takes Literal
arguments. That is, before Python 3.11 adds the LiteralString
type. LiteralString
allows a variable to accept only literal strings.
def system(command: LiteralString) -> int: ...
system("convert INPUT OUTPUT") # ok
system(f"convert INPUT -resize {size} OUTPUT") # error!
At the time of publishing, Mypy define LiteralString
as an alias to str
. Thus, the latest version of Mypy with Python 3.11 won't catch any error in the snippets above and below.
It still limits us to literal values. Going back to our use case, we want to be able to pass in the size of the image. It is actually possible to make size
safe by sanitizing the value for shell use. These are the usual quote
or escape
function that takes user input and return strings that are safe to use. For shells, Python has the shlex.quote
function available. The input and output of these functions have different safeness property. It would be interesting to reflect this difference in the types:
ShellQuotedString = NewType("ShellQuotedString", str)
def quote(value: str) -> ShellQuotedString: ...
Here we introduce a new type that includes the safety property. Python includes the NewType
tool to easily create a new type from an existing one. This new type can be used wherever the base time is used, but not the other way:
safe: ShellQuotedString = ShellQuotedString("This string is safe")
unsafe: str = safe # ok
safe = unsafe # error: Incompatible types in assignment
Now we have both kinds of safe data: Literal and quoted data. For ease of use, we can alias both to an enum:
ShellString = LiteralString | ShellQuotedString
This enum is all the safe types of command that system can execute. This ensures that a developer has to think about quoting user input before passing them to system
.
def system(command: ShellString) -> int: ...
system("convert INPUT OUTPUT") # ok
system("convert INPUT -resize {quote(size)} OUTPUT") # error!
It's still not accepting our quoted size argument. An interesting property of NewType
is that any operation done on it will convert it back to the base type. For example, concatenating a str
to a ShellQuotedString
will return a str
. This put the burden on the API designer to define the set of safe operation. If we want to provide operations to work on our safe strings, we have to implement them.
In our case, we know that concatenating shell-safe strings will create a new shell-safe string. So we can expose this operation as the safe way to mix user input and literal values.
def shell_format(
format_string: str
*args: ShellString,
*kwargs: ShellString,
) -> ShellQuotedString:
return ShellQuotedString(format_string.format(*args, **kwargs))
output_path = "/tmp/out"
shell_format(
"convert INPUT --resize {} {}",
quote(size),
output_path
) # ok
shell_format(
"convert INPUT --resize {} {}",
size,
output_path
) # error!
Note that this implementation might still leave room for security vulnerability. Using the function like shell_format("convert -resize '{}'", size)
would leave size
effectively un-quoted. It would be possible to add some more checks to ensure any {}
literals aren't surrounded by quotes. This is a great example of why regular strings operation might lead to unsafe behavior if applied to our new types.
Now that we have all the operation and safety property we need, we can now glue everything together:
Our system
API is now proved to be safe. It should be catching any misuse with unsanitized values. Note that we are using here a ShellQuotedString
instead of a SafeString
type that could be reused for many other cases (SQL quoting, html.escape
, etc.). Our type safety is relative to the usage of it. The return value of html.escape
is safe to render without introducing XSS. Yet, the same value could introduce SQL injection if used as-is in a query.
Adding safety to types can go beyond escaping or quoting patterns. Types can expose most implicit preconditions to static analysis. For example, an static file API that open a file from user input could define a SafePath
type. A function could then convert a str
to SafePath
after it checks that it's under a specific directory.
We have seen that we can easily use types to embed semantic in our code. Python typing annotation can do much more than just preventing TypeError
. It can make precondition explicit and prevent critical security vulnerabilities.
Posted on July 21, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.