Don't parse CSVs by hand!
Randall
Posted on May 11, 2022
I've seen custom JavaScript code to parse CSV data a number of times. It looks something like this:
const csvData = fs.readFileSync(csvFilePath, 'utf8');
const lines = csvData.split('\n');
const rows = lines.map(line => line.split(','));
So simple, and it may be fine if you're parsing a single static CSV file whose content you know. But if you're parsing unknown CSV files, don't do this, there are many ways it can break.
What Can Go Wrong
Escaped cells
Columns in a CSV file may contain commas. Such columns are typically escaped with quotes. The raw data for a row with such a column might look something like this:
Name,Best Line
Hamlet,"To be, or not to be"
Though the second line has two commas, this produces a table with two columns, not three:
Name | Best Line |
---|---|
Hamlet | To be, or not to be |
This breaks the simple CSV parser implementation which just splits on ,
.
Line Endings
The simple CSV parser implementation assumes LF line endings (\n
), but text files, including CSV files, created on Windows will often have CRLF line endings (\r\n
). Further, there may be line ending characters within escaped cells. Needless to say, these cases break the simple CSV parser implementation.
What to do instead
Bring in a library that's built for this. I like to use csv-parse. It's easy to use:
import { parse as csvParse } from 'csv-parse/sync';
const csvData = fs.readFileSync(csvFilePath, 'utf8');
const rows = csvParse(csvData);
That's it.
Conclusion
I know no one likes bringing in libraries to solve what seems to be a simple task, but parsing arbitrary CSVs correctly is actually a lot harder than it looks at first glance. Save yourself the pain, use someone else's battle-hardened parser.
Posted on May 11, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.