Example of Yaml Generator and Validator in Python
Joseph D. Marhee
Posted on December 18, 2018
If you work with Yaml regularly or not, the thing most people know about it is that it definitely cares about whitespace, and even careful practitioners can still sometimes automate a bad process, and with Yaml, this is a bad time, so validating (particularly when generating Yaml, to say nothing of writing it by hand) is a must.
Let's take a common Yaml use case: Kubernetes manifests. In my case, I wanted to create different configurations, populate information on-the-fly (things like tokens of a known length, for example), and then dump to a Yaml file used elsehwere. I did this with Python using pyyaml
.
To use encryption at-rest in your cluster for resources like secrets, Kubernetes requires an EncryptionConfig file, which is a fairly short piece of Yaml to generate, it just needs the provider, a key, and which resource to encrypt at rest in Etcd, which to generate as Yaml, I'm just going to represent this as JSON:
configIn = {
"kind": "EncryptionConfig",
"apiVersion": "v1",
"resources": [
{
"resources": [
"secrets"
],
"providers": [
{
"aescbc": {
"keys": [
{
"name": "key1",
"secret": "%s" % (generateSecret(32))
}
]
}
}
]
}
]
}
and then we're going to use that generateSecret
(a lambda that takes a string length and returns a base64-encoded version of a random string of that length) result to populate that JSON object's value:
import base64
import random
import string
import os
import sys
import yaml
generateSecret = lambda length: base64.b64encode(''.join(random.sample(string.lowercase+string.digits,length))) #32 length
def populateConfig():
configIn = {
"kind": "EncryptionConfig",
"apiVersion": "v1",
"resources": [
{
"resources": [
"secrets"
],
"providers": [
{
"aescbc": {
"keys": [
{
"name": "key1",
"secret": "%s" % (generateSecret(32))
}
]
}
}
]
}
]
}
configOut = yaml.dump(configIn)
return configOut
and then have yaml.dump
return that object to us as Yaml:
apiVersion: v1
kind: EncryptionConfig
resources:
- providers:
- aescbc:
keys:
- {name: key1, secret: BASE64_STRING }
resources: [secrets]
which is valid Yaml, but to make it idiomatic with the Kubernetes style (and because the experimental feature supported by this won't accept this as of 1.11), we'll change the configOut
line's dump option to look like this:
configOut = yaml.dump(configIn,default_flow_style=False)
to return:
apiVersion: v1
kind: EncryptionConfig
resources:
- providers:
- aescbc:
keys:
- name: key1
secret: BASE64_STRING
resources:
- secrets
Okay, great, we've got our config, and it looks reasonably correct, but since it's automatically created, we probably want to double check.
There's a few ways to do this, but because my input was relatively simple, and the schema wasn't being modified in any meaningful way, just populating data, and because I'd prefer to do with this with the libraries already imported, we can use the yaml
package's built-in safe_load
method to see if an incoming config (like the one returned by the above function) validates:
def validateYaml(config):
try:
yaml.safe_load(config)
return config
except:
sys.exit('Failed to validate config.')
This function will bail if the config cannot validate (which becomes important in a moment), but returns the valid config if it does, so with this information, we can advance to our program's entrypoint to stitch all this together, where we'll write the config to a file if it is valid:
if __name__ == '__main__':
config = validateYaml(populateConfig())
EncryptionConfig = open("secrets.conf","w")
EncryptionConfig.write(config)
EncryptionConfig.close()
print "OK"
If validateYaml
fails, it will prevent us from writing a bad config (or at least one that is certain not to work, other validation issues may present themselves that safe_load may not detect by default in a more complicate Yaml input).
Posted on December 18, 2018
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.