A Guide to Parsing CSV Files in Go
The tutorial for this project can be found on Medium and dev.to.
Posted on April 2, 2024
Among all the programming languages available, Go, or Golang, has become more popular in the past years. In 2023 Go re-entered the top 10 spot of most popular languages, according to InfoWorld. This language has garnered so much clout because of its simplicity, efficiency and ability to compile directly to machine code, offering many speed benefits.
As someone new to Go, I would like to dive deeper into this language to find out its potential. The best way to learn something new is by doing it. So that is why I started on a project to not only hone my Go skills, but also solve a common task: CSV parsing and data manipulation. Through this blog post I will illustrate how to parse a CSV file containing rows of articles. These articles need to be filtered based on a property and then written to a new CSV file. The accompanying source code to this project can be found on GitHub.
To get started with the project, you will need the following:
The first step in creating our Go project involves setting up a new directory called demo-go-csv-parser
. Then navigate into this directory.
mkdir demo-go-csv-parser
cd demo-go-csv-parser
The next step is to initialize a Go module called demo-go-csv-parser
go mod init demo-go-csv-parser
A go.mod
file will be created inside your directory. This module file is used to organize and manage dependencies, similar to the package.json
of the Node.js ecosystem.
The dependency that we're going to use is called gocsv. This package provides a variety of built-in functions for parsing CSVs using Go structs. To include the dependency in your project, run the following command:
go get github.com/gocarina/gocsv
It's time to dive into the coding aspect of the project to get a taste of how the Go programming language works. To maintain clarity, we're going to break down the coding process into the following sections:
demo-go-csv-parser
project, create a file called main.go
.
Populate the file with the following:
package main
import (
"github.com/gocarina/gocsv"
"os"
)
The first line indicates the package name for the file. Every Go file needs to start with a package declaration. The second part of the file consists of the import block with two dependencies:
For this project, we're going to use a sample CSV file that you can find at this GitHub link. Feel free to download and include this sample file in your project. Most CSV files are structured and have a header to denominate what each column will be used for. In Go, we can map each of CSV row into a custom data structure called struct
. This struct will contain fields corresponding to a CSV column.
In our CSV file, the first two columns in this CSV file are named Title
and URL
. Mapping these two into a Go struct called Article
would look like this:
type Article struct {
Title string
URL string
}
The gocsv dependency supports the usage of tags to indicate what the column name is in the CSV file. This is a handy feature in cases where you would have spaces in the column name or if the column name deviates from the actual field name we would like to use in the Go struct.
Considering all the columns of our CSV file, we can add all the columns with the csv tags to the final Article
struct which should look like this:
type Article struct {
Title string `csv:"Title"`
URL string `csv:"URL"`
DocumentTags string `csv:"Document tags"`
SavedDate string `csv:"Saved date"`
ReadingProgress string `csv:"Reading progress"`
Location string `csv:"Location"`
Seen string `csv:"Seen"`
}
We are going to get to the crux of the project. We need to be able to read a CSV file called example.csv
located in the root directory. To achieve this, we're going to write a separate ReadCsv()
function to achieve this:
func ReadCsv() []*Article {
// Try to open the example.csv file in read-write mode.
csvFile, csvFileError := os.OpenFile("example.csv", os.O_RDWR, os.ModePerm)
// If an error occurs during os.OpenFIle, panic and halt execution.
if csvFileError != nil {
panic(csvFileError)
}
// Ensure the file is closed once the function returns
defer csvFile.Close()
var articles []*Article
// Parse the CSV data into the articles slice. If an error occurs, panic.
if unmarshalError := gocsv.UnmarshalFile(csvFile, &articles); unmarshalError != nil {
panic(unmarshalError)
}
return articles
}
The function ReadCsv
can be broken down into the following parts:
Article
. os.OpenFile()
to open example.csv
with specific flags in read-write mode. The flag os.ModePerm
indicates a the file mode for creating new files if necessary. This openFile()
either returns a file handle (csvFile
) or an error (csvFileError
).nil
is Go's equivalent to null or empty. If an error was stored in the variable, we exit the function and report the error.defer csvFile.Close()
makes sure that the opened csvFile
is always closed regardless of when the function return happens. This is best practice for file resource management. gocsv.UnmarshalFile()
function is provided with the file handle and the reference articles slice. It reads the CSV rows and populates the slice with Article
instances.articles
variable will be returned correctly.
### Filter articles
After successfully parsing the CSV file and storing its contents into an articles
slice, the next step is to filter this slice. We want to only retain the articles whose location is set to inbox. We're going to create a function called GetInboxArticles
to achieve this:
func GetInboxArticles(articles []*Article) []*Article {
// Initialize an empty slice to store inbox articles
var inboxArticles []*Article
// Iterate through each article in the provided slice.
for _, article := range articles {
// Check if the article's Location is equal to inbox
if article.Location == "inbox" {
// If the article's location is inbox, add it to the inboxArticles slice
inboxArticles = append(inboxArticles, article)
}
}
return inboxArticles
}
Let's closely examine this function:
Article
struct and returns a slice of the same type.inboxArticles
that will store the articles that meet the inbox criteria. articles
slice. If the location property of the article is equal to inbox
, we append this element to the inboxArticles
slice. inboxArticles
.Now that we've extracted the inbox articles, we want to persist this data into a new CSV file. Writing contents to a CSV file will be similar to reading the contents as in the previous steps. We're going to create a function WriteCsv
that looks like this:
func WriteCsv(articles []*Article) {
// Open result.csv for writing; create it if it doesn't exist, or overwrite it if it already exists.
resultFile, resultFileError := os.OpenFile("result.csv", os.O_WRONLY|os.O_CREATE|os.O_TRUNC, os.ModePerm)
// Check for errors when opening or creating the file. If there's an error, panic.
if resultFileError != nil {
panic(resultFileError)
}
defer resultFile.Close()
// Marshal the articles into the CSV format and write them to the result.csv file
if marshalFileError := gocsv.MarshalFile(&articles, resultFile); marshalFileError != nil {
panic(marshalFileError)
}
}
Let's go through this piece of code step by step:
WriteCSV
that accepts an input argument of Article
slice. os.OpenFile()
to create or open the file results.csv
. The flags passed into the function ensure that the file is write-only, will be created if it doesn't exist, and overwritten if it already exists. result.csv
, we check if the variable resultFileError
contains an error. If it does contain an error, we exit the function with the panic operator. resultFile
is closed whenever the function exits with the defer
.articles
to the resultFile
with gocsv.MarshalFile()
. The MarshlFile
function expects to arguments: reference to the slice, and the CSV file to which it should write the contents. If there was an error during the marshaling process, the function will panic.
### Putting it all together
We've written three helper functions: Reading a CSV file, writing to a CSV file and filtering articles. We're going to combine all of these three into a main function like this:
func main() {
articles := ReadCsv()
filteredArticles := GetInboxArticles(articles)
WriteCsv(filteredArticles)
}
With all the Go code in place, it's time to run it! This can be done with the following command:
go run main.go
If done correctly, your project will have a new file named result.csv
. Congratulations, you have just run your first Go project! π
For our everyday task of processing a CSV file, we can see that Go's simplicity, efficiency and easy-to-learn syntax shine brightly. This makes it easy for new learners to pick up this powerful language and rich ecosystem of packages and utilities. Keeping an eye on new tools and languages like Go can expand your skills toolset and offer you a different vantage point to think and solve problems. Of course the best tool for the job will depend on your project's requirements. Perhaps you will consider integrating Go for your next software project. Happy coding! π§βπ»
Posted on April 2, 2024
Sign up to receive the latest update from our blog.