Data Validation Libraries - Analysis & Comparison using Python

anirudhann

Anirudhan

Posted on April 4, 2021

Data Validation Libraries - Analysis & Comparison using Python

Here, I have taken some of the leading data validation libraries in 2021 and I have analyzed and compared them using python which can be highly useful for your applications.

All the data validations libraries that I have utilized are Open Source tools and are mentioned below:

  • Cerberus
  • Colander
  • Jsonschema
  • Marshmellow
  • Pydantic
  • Schema
  • Voluptuous

Below are some of the basic requirements/checks to choose the suitable tool for your applications:

1. Mandatory field check

Whenever we give request to an API, we will have certain fields as mandatory and few others as optional. So, I have considered this check as the primary one. No surprise here that all the leading libraries have this feature.

Data Validation Libraries Mandatory field check
Cerberus yes
Colander yes
Jsonschema yes
Marshmellow yes
Pydantic yes
Schema yes
Voluptuous yes

2. Data type check

We all know most libraries will have certain standard data types (like str, int, dict, etc...) Even though its always advisable to make use of the standard datatypes provided by the validation libraries but sometimes we may need to extend standard libraries or to create own custom data type checks.

Data Validation Libraries Data type check
Cerberus yes
Colander yes
Jsonschema yes
Marshmellow yes
Pydantic yes
Schema yes
Voluptuous yes

3. Min and Max option

As a kind of basic option, Field values should be validated for Minimum and Maximum characters/integers allowed. No wonder that most libraries have this option.

Data Validation Libraries Min and Max option
Cerberus yes
Colander yes
Jsonschema yes
Marshmellow yes
Pydantic yes
Schema No
Voluptuous yes

4. Regex option

If we want to allow certain special characters or accept specific patterns, then indisputably, regex is the way to go and its quite indeed that all the below validation libraries have this feature.

Data Validation Libraries Regex option
Cerberus yes
Colander yes
Jsonschema yes
Marshmellow yes
Pydantic yes
Schema yes
Voluptuous yes

5. Dynamic field validation based on another field

To validate the field based on the other dynamic field value given in the request. This check at times becomes significant for few applications.

Data Validation Libraries Field validation based on other Dynamic field
Cerberus No
Colander yes
Jsonschema yes
Marshmellow yes
Pydantic yes
Schema No
Voluptuous No

6. Custom validation

Option to extend standard validation or create custom validation for our own applications.

Data Validation Libraries Custom validation
Cerberus yes
Colander yes
Jsonschema yes
Marshmellow yes
Pydantic yes
Schema yes
Voluptuous yes

7. Error response for all fields

If multiple invalid values passed for various fields in the request, then it has to capture and throw error for all the invalid fields.

Data Validation Libraries Error response for all fields
Cerberus yes
Colander yes
Jsonschema yes
Marshmellow yes
Pydantic yes
Schema No
Voluptuous No

8. Dynamic Alert message option based on error

Though we have standard error definition for different types of errors applicable for each field. Sometimes we may require to configure custom error message for few fields in our applications.

Data Validation Libraries Dynamic alert message option
Cerberus yes
Colander yes
Jsonschema yes
Marshmellow yes
Pydantic yes
Schema yes
Voluptuous yes

9. Schema/Model Re-usability

On few occasions, we may need to extend or reuse necessary Schema/Model created for one of the API. This is nothing but class Inheritance.

Data Validation Libraries Schema/Model Re-usability
Cerberus yes
Colander yes
Jsonschema yes
Marshmellow yes
Pydantic yes
Schema yes
Voluptuous yes

10. Python support

We also need to consider latest python version compatibility & forum support for the chosen packages (considered till March 2021)

Data Validation Libraries Python support
Cerberus yes
Colander yes
Jsonschema yes
Marshmellow yes
Pydantic yes
Schema Support is less
Voluptuous yes

Libraries to Pick
Clearly, we could see that only four out of seven data validation libraries satisfies all our criteria's mentioned above.

  • Pydantic,
  • Marshmallow,
  • Jsonschema and
  • Colander

Performance comparison

We chose top three libraries from the above list and ran performance testing.

Number of records given in the request vs Time taken (in seconds) to process the request

Number of records Pydantic Marshmallow Jsonschema
100 0.0039 0.033 0.037
1000 (1k) 0.036 0.41 0.37
10000 (10k) 0.36 3.36 3.59
100000 (1L) 3.60 33.52 35.84
1000000 (10L) 50.52 644.06 797.40

Clearly, we could see that Pydantic is 10X faster than other leading data validation libraries like Marshmallow, Jsonschema.

Hence Pydantic is an absolute winner and seems to satisfy all our basic requirements with a lightening performance (processing 10L requests in a minute)

Note: Please refer the attached link in my GitHub account for the sample python codebase for each of the libraries and how to effectively use the above functionalities or checks that I have briefly explained
PoC on data validation libraries python

💖 💪 🙅 🚩
anirudhann
Anirudhan

Posted on April 4, 2021

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related