A Guide to Parsing CSV Files in Go

duncanlew

Duncan Lew

Posted on April 2, 2024

A Guide to Parsing CSV Files in Go

Among all the programming languages available, Go, or Golang, has become more popular in the past years. In 2023 Go re-entered the top 10 spot of most popular languages, according to InfoWorld. This language has garnered so much clout because of its simplicity, efficiency and ability to compile directly to machine code, offering many speed benefits.

As someone new to Go, I would like to dive deeper into this language to find out its potential. The best way to learn something new is by doing it. So that is why I started on a project to not only hone my Go skills, but also solve a common task: CSV parsing and data manipulation. Through this blog post I will illustrate how to parse a CSV file containing rows of articles. These articles need to be filtered based on a property and then written to a new CSV file. The accompanying source code to this project can be found on GitHub.

Prerequisites

To get started with the project, you will need the following:


1. Initialize the project

The first step in creating our Go project involves setting up a new directory called demo-go-csv-parser. Then navigate into this directory.

mkdir demo-go-csv-parser
cd demo-go-csv-parser
Enter fullscreen mode Exit fullscreen mode

The next step is to initialize a Go module called demo-go-csv-parser

go mod init demo-go-csv-parser
Enter fullscreen mode Exit fullscreen mode

A go.mod file will be created inside your directory. This module file is used to organize and manage dependencies, similar to the package.json of the Node.js ecosystem.

2. Install the CSV dependency

The dependency that we're going to use is called gocsv. This package provides a variety of built-in functions for parsing CSVs using Go structs. To include the dependency in your project, run the following command:

go get github.com/gocarina/gocsv
Enter fullscreen mode Exit fullscreen mode

3. Write the code

It's time to dive into the coding aspect of the project to get a taste of how the Go programming language works. To maintain clarity, we're going to break down the coding process into the following sections:

  • Create main file
  • CSV file setup
  • Read file
  • Filter articles
  • Write file By decomposing the whole project into bite-size chunks, we can tackle each part with more attention. ### Create main file In the root folder of the demo-go-csv-parser project, create a file called main.go. Populate the file with the following:
package main

import (
    "github.com/gocarina/gocsv"
    "os"
)
Enter fullscreen mode Exit fullscreen mode

The first line indicates the package name for the file. Every Go file needs to start with a package declaration. The second part of the file consists of the import block with two dependencies:

  • gocsv: the external package that we've previously installed
  • os: this built-in Go dependency will be used for I/O functionalities to read and write CSV files.

CSV file

For this project, we're going to use a sample CSV file that you can find at this GitHub link. Feel free to download and include this sample file in your project. Most CSV files are structured and have a header to denominate what each column will be used for. In Go, we can map each of CSV row into a custom data structure called struct. This struct will contain fields corresponding to a CSV column.

In our CSV file, the first two columns in this CSV file are named Title and URL. Mapping these two into a Go struct called Article would look like this:

type Article struct {
    Title string 
    URL string
}
Enter fullscreen mode Exit fullscreen mode

The gocsv dependency supports the usage of tags to indicate what the column name is in the CSV file. This is a handy feature in cases where you would have spaces in the column name or if the column name deviates from the actual field name we would like to use in the Go struct.

Considering all the columns of our CSV file, we can add all the columns with the csv tags to the final Article struct which should look like this:

type Article struct {
    Title           string `csv:"Title"`
    URL             string `csv:"URL"`
    DocumentTags    string `csv:"Document tags"`
    SavedDate       string `csv:"Saved date"`
    ReadingProgress string `csv:"Reading progress"`
    Location        string `csv:"Location"`
    Seen            string `csv:"Seen"`
}
Enter fullscreen mode Exit fullscreen mode

Read file

We are going to get to the crux of the project. We need to be able to read a CSV file called example.csv located in the root directory. To achieve this, we're going to write a separate ReadCsv() function to achieve this:

func ReadCsv() []*Article {
    // Try to open the example.csv file in read-write mode.
    csvFile, csvFileError := os.OpenFile("example.csv", os.O_RDWR, os.ModePerm)
    // If an error occurs during os.OpenFIle, panic and halt execution.
    if csvFileError != nil {
        panic(csvFileError)
    }
    // Ensure the file is closed once the function returns
    defer csvFile.Close()

    var articles []*Article
    // Parse the CSV data into the articles slice. If an error occurs, panic.
    if unmarshalError := gocsv.UnmarshalFile(csvFile, &articles); unmarshalError != nil {
        panic(unmarshalError)
    }

    return articles
}
Enter fullscreen mode Exit fullscreen mode

The function ReadCsv can be broken down into the following parts:

  • The function returns an slice of pointers to the elements of type Article.
  • We use os.OpenFile() to open example.csv with specific flags in read-write mode. The flag os.ModePerm indicates a the file mode for creating new files if necessary. This openFile() either returns a file handle (csvFile) or an error (csvFileError).
  • Immediately in the next step, we check for errors. The nil is Go's equivalent to null or empty. If an error was stored in the variable, we exit the function and report the error.
  • The defer csvFile.Close() makes sure that the opened csvFile is always closed regardless of when the function return happens. This is best practice for file resource management.
  • With the file open and error handling in place, we're going to proceed to parse the CSV content. The gocsv.UnmarshalFile() function is provided with the file handle and the reference articles slice. It reads the CSV rows and populates the slice with Article instances.
  • If the parsing of the csvFile completes without errors, the articles variable will be returned correctly. ### Filter articles After successfully parsing the CSV file and storing its contents into an articles slice, the next step is to filter this slice. We want to only retain the articles whose location is set to inbox. We're going to create a function called GetInboxArticles to achieve this:
func GetInboxArticles(articles []*Article) []*Article {
    // Initialize an empty slice to store inbox articles
    var inboxArticles []*Article

    // Iterate through each article in the provided slice.
    for _, article := range articles {
        // Check if the article's Location is equal to inbox
        if article.Location == "inbox" {
            // If the article's location is inbox, add it to the inboxArticles slice
            inboxArticles = append(inboxArticles, article)
        }
    }

    return inboxArticles
}
Enter fullscreen mode Exit fullscreen mode

Let's closely examine this function:

  • This function accepts a slice of pointers to the Article struct and returns a slice of the same type.
  • We create an empty slice called inboxArticles that will store the articles that meet the inbox criteria.
  • We create a for loop that's going to iterate through each element of the articles slice. If the location property of the article is equal to inbox, we append this element to the inboxArticles slice.
  • After the loop has finished, we return the slice inboxArticles.

Write file

Now that we've extracted the inbox articles, we want to persist this data into a new CSV file. Writing contents to a CSV file will be similar to reading the contents as in the previous steps. We're going to create a function WriteCsv that looks like this:

func WriteCsv(articles []*Article) {
    // Open result.csv for writing; create it if it doesn't exist, or overwrite it if it already exists.
    resultFile, resultFileError := os.OpenFile("result.csv", os.O_WRONLY|os.O_CREATE|os.O_TRUNC, os.ModePerm)

    // Check for errors when opening or creating the file. If there's an error, panic.
    if resultFileError != nil {
        panic(resultFileError)
    }
    defer resultFile.Close()

    // Marshal the articles into the CSV format and write them to the result.csv file
    if marshalFileError := gocsv.MarshalFile(&articles, resultFile); marshalFileError != nil {
        panic(marshalFileError)
    }
}
Enter fullscreen mode Exit fullscreen mode

Let's go through this piece of code step by step:

  • We create a function WriteCSV that accepts an input argument of Article slice.
  • We use os.OpenFile() to create or open the file results.csv. The flags passed into the function ensure that the file is write-only, will be created if it doesn't exist, and overwritten if it already exists.
  • After trying to open the file result.csv, we check if the variable resultFileError contains an error. If it does contain an error, we exit the function with the panic operator.
  • For good I/O hygiene we ensure that the resultFile is closed whenever the function exits with the defer.
  • Finally, we are going to write the contents of the articles to the resultFile with gocsv.MarshalFile(). The MarshlFile function expects to arguments: reference to the slice, and the CSV file to which it should write the contents. If there was an error during the marshaling process, the function will panic. ### Putting it all together We've written three helper functions: Reading a CSV file, writing to a CSV file and filtering articles. We're going to combine all of these three into a main function like this:
func main() {
    articles := ReadCsv()
    filteredArticles := GetInboxArticles(articles)
    WriteCsv(filteredArticles)
}
Enter fullscreen mode Exit fullscreen mode

Run πŸš€

With all the Go code in place, it's time to run it! This can be done with the following command:

go run main.go
Enter fullscreen mode Exit fullscreen mode

If done correctly, your project will have a new file named result.csv. Congratulations, you have just run your first Go project! πŸŽ‰

Takeaway

For our everyday task of processing a CSV file, we can see that Go's simplicity, efficiency and easy-to-learn syntax shine brightly. This makes it easy for new learners to pick up this powerful language and rich ecosystem of packages and utilities. Keeping an eye on new tools and languages like Go can expand your skills toolset and offer you a different vantage point to think and solve problems. Of course the best tool for the job will depend on your project's requirements. Perhaps you will consider integrating Go for your next software project. Happy coding! πŸ§‘β€πŸ’»

A Guide to Parsing CSV Files in Go

The tutorial for this project can be found on Medium and dev.to.









If the content was helpful, feel free to support me here:

Buy Me A Coffee
πŸ’– πŸ’ͺ πŸ™… 🚩
duncanlew
Duncan Lew

Posted on April 2, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related