C# Excel read/write on the cheap

karenpayneoregon

Karen Payne

Posted on May 27, 2024

C# Excel read/write on the cheap

Introduction

When working with Excel frequently while coding in C#, the smart direction is to pay for a quality library which supports all Excel formats even if not needed as there may be a time a format is needed.

While Excel operations are infrequent the choices are code yourself without a library which means if Microsoft makes breaking changes the developer must figure out how to fix code to handle the breaking changes.

In this article learn how to read sheets, write to sheets, update cells. For the code samples ExcelMapper and SpreadSheetLight NuGet packages are used where ExcelMapper does the core work.

Data operations are performed by Microsoft Entity Framework Core 8.

Data validation

After reading data from Excel with the intention to save the Excel data to a database, in some cases it would be prudent to validate the incoming data. For example, several columns in a worksheet have unacceptable characters as shown below.

unacceptable characters sample

To handle unacceptable characters, create a model to represent the WorkSheet.



public partial class Products : INotifyPropertyChanged
{
    [NotMapped]
    public int RowIndex { get; set; }
    public int Id { get; set; }
    public int ProductID { get; set; }
    public string ProductName { get; set; }
    public string CategoryName { get; set; }
    public int? SupplierID { get; set; }
    public string Supplier { get; set; }
    public int? CategoryID { get; set; }
    public string QuantityPerUnit { get; set; }
    public decimal UnitPrice { get; set; }
    public int? UnitsOnOrder { get; set; }
    public int? ReorderLevel { get; set; }
    public event PropertyChangedEventHandler? PropertyChanged;
    protected virtual void OnPropertyChanged([CallerMemberName] string propertyName = null!)
    {
        PropertyChanged?.Invoke(this, new PropertyChangedEventArgs(propertyName));
    }
}


Enter fullscreen mode Exit fullscreen mode

Decide how to validate, for this we will use FluentValidation NuGet package.

For those new to FluentValidation see FluentValidation tips.

Since the validation needed for unacceptable characters is not handled natively by FluentValidation a custom extension is needed.

The following language extension method will be used in a FluentValidation extension method to follow.



public static class StringExtensions
{
    /// <summary>
    /// Used for validating a class string property is valid via FluentValidation
    /// </summary>
    /// <param name="text">Text to validate</param>
    /// <returns>True if valid and false if invalid</returns>
    /// <remarks>
    /// What it considers if there are foreign characters in the string, allows spaces and numbers
    /// </remarks>
    public static bool IsOnlyAsciiLetters(this string text)
    {

        foreach (var item in text)
        {
            if (char.IsNumber(item))
            {
                continue;
            }
            switch (item)
            {
                case >= 'A' and <= 'Z':
                case >= 'a' and <= 'z':
                case ' ':
                case '.':
                case ',':
                case '/':
                case '\'':
                case '&':
                    continue;
                default:
                    return false;
            }
        }
        return true;
    }
}


Enter fullscreen mode Exit fullscreen mode

FluentValidation extension method using the extension method above.



public static class RuleBuilderExtensions
{

    public static IRuleBuilderOptions<T, string> NoNoneAsciiCharacters<T>(this IRuleBuilder<T, string> ruleBuilder) 
        => ruleBuilder
            .NotEmpty()
            .Must(m => m.IsOnlyAsciiLetters())
            .WithMessage("'{PropertyName}' is not valid");
}


Enter fullscreen mode Exit fullscreen mode

Validator



public class ProductsValidator : AbstractValidator<Products>
{
    public ProductsValidator()
    {
        RuleFor(p => p.ProductName).NoNoneAsciiCharacters();
        RuleFor(p => p.CategoryName).NoNoneAsciiCharacters();
        RuleFor(p => p.QuantityPerUnit).NotEmpty();
        RuleFor(p => p.Supplier).NoNoneAsciiCharacters();
        RuleFor(p => p.ProductID).GreaterThan(0);
    }
}


Enter fullscreen mode Exit fullscreen mode

Read and validate

Using the following method. Read data from Excel Iterate each row, create an instance of the validator, pass in the row data, see if the data is valid.

There are two list, one for valid data and one for invalid data. Invalidate data is saved to a database table which can be used to inspect, delete or fix.

The entire process is done in a Windows Forms project yet since there is limited code in the forms a developer can do the same in other project types.

Note
The entire section for validating is not covered in respect to saving bad data as the reader may have their own ideas.



internal class ImportOperations
{
    public static async Task<(string badRecord, List<Products> badRecords, int saved, int rejected)> Validate(string fileName = "Products.xlsx")
    {
        ExcelMapper excel = new();
        var products = (await excel.FetchAsync<Products>(fileName, nameof(Products))).ToList();

        List<Products> goodList = [];
        List<Products> badList = [];
        StringBuilder builder = new();

        int rejected = 0;

        for (int index = 0; index < products.Count; index++)
        {
            var validator = new ProductsValidator();
            var result = await validator.ValidateAsync(products[index]);
            if (result.IsValid == false)
            {
                rejected++;

                foreach (var error in result.Errors)
                {
                    builder.AppendLine($"{index + 1,-10} {error.PropertyName,-30}{error.AttemptedValue}");
                }

                products[index].RowIndex = index + 1;
                badList.Add(products[index]);
            }
            else
            {
                goodList.Add(products[index]);
            }

        }

        var saved = 0;
        if (goodList.Count > 0)
        {
            await using var context = new Context();
            await context.Database.EnsureDeletedAsync();
            await context.Database.EnsureCreatedAsync();
            context.Products.AddRange(goodList);
            saved = await context.SaveChangesAsync();

        }

        return (builder.ToString(), badList, saved, rejected);

    }
}


Enter fullscreen mode Exit fullscreen mode

Calling the above where the variable returned is deconstruction, for more on this see The art of Deconstructing.



var (badRecords, badList, saved, rejected) = await ImportOperations.Validate();


Enter fullscreen mode Exit fullscreen mode

Screenshots

main and results forms after importing data

Important

All data read from Excel files in the code sample are tubular for 99% of the examples presented and the data is perfect, meaning that for instance if a column contains dates, all rows of that column have valid dates.

In the real world there will be imperfect data which means that a developer should always consider at least one cell has bad or invalid data. With that known there is no one method to checks for proper data. For each Excel file a developer must understand assertion must be performed such as, for example for a sheet containing customer data can first and last name be empty or that for a sheet of products, what to do if data is missing.

Example of checking if the fourth column contains valid dates. This is a hard coded sample which is provided in source code.



using SpreadsheetLight;
namespace ExcelMapperApp1.Classes;
internal class LightOperations
{
    /// <summary>
    /// For article to show an example to test if the person's birthdate can be read as a date
    /// </summary>
    /// <returns>
    /// If there are issues, the list of rows with issues is returned
    /// </returns>
    public static (List<int> rows, bool hasIssues) Iterate()
    {

        List<int> list = [];

        const string excelFile = "Nested1.xlsx";
        const int columnIndex = 4;

        using SLDocument document = new(excelFile);

        var stats = document.GetWorksheetStatistics();

        // skip header row
        for (int rowIndex = 2; rowIndex < stats.EndRowIndex + 1; rowIndex++)
        {
            var date = document.GetCellValueAsDateTime(rowIndex, columnIndex);
            if (date == new DateTime(1900,1,1))
            {
                list.Add(rowIndex);
            }
        }

        return (list, list.Any());
    }
}


Enter fullscreen mode Exit fullscreen mode

Other considerations, does the file exists? Can the file be opened? Does the worksheet exists?

Always check if the file exists. Concerning if a file can be opened, if a user has the file open outside of the program and a developer’s code attempting to read the file, an exception is throw so wrap the code in a try/catch. Regarding sheet names, if they happen to changed, code will fail. Code has been provided to show how to check for sheet existence.

GitHub repository

Read sheet data to a SQL-Server table

First check if the worksheet exists, if not return to caller.

Next, reset the SQL-Server database table to zero rows and reset identity.

Read worksheet data using ExcelMapper using the following model.



public partial class Customers
{
    public int Id { get; set; }

    public string Company { get; set; }

    public string ContactType { get; set; }

    public string ContactName { get; set; }

    public string Country { get; set; }

    public DateOnly JoinDate { get; set; }
    public override string ToString() => Company;

}


Enter fullscreen mode Exit fullscreen mode

To match this sheet.

Customer worksheet

Next, take the list returned and add to EF Core database. If an exception is thrown it is written to the console screen while for a real application the exception should be written to a log file using a package like SeriLog. See Serilog logging and EF Core logging.



public static async Task CustomersToDatabase()
{

    PrintCyan();
    const string excelFile = "Customers.xlsx";

    if (SheetExists(excelFile, nameof(Customers)) == false)
    {
        AnsiConsole.MarkupLine($"[red]Sheet {nameof(Customers)} not found in {excelFile}[/]");
        return;
    }

    try
    {
        DapperOperations operations = new();
        operations.Reset();

        ExcelMapper excel = new();
        await using var context = new Context();

        var customers = (await excel.FetchAsync<Customers>(excelFile, nameof(Customers))).ToList();

        context.Customers.AddRange(customers);
        var affected = await context.SaveChangesAsync();

        AnsiConsole.MarkupLine(affected > 0 ? $"[cyan]Saved[/] [b]{affected}[/] [cyan]records[/]" : "[red]Failed[/]");
    }
    catch (Exception ex)
    {
        ex.ColorWithCyanFuchsia();
    }
}


Enter fullscreen mode Exit fullscreen mode

Read/write to secondary Excel file

In this example the idea is say business asked for a report and only a few columns are required.

Data is read using the following model.



public class Products
{
    public int ProductID { get; set; }

    public string ProductName { get; set; }

    public string CategoryName { get; set; }
    public int? SupplierID { get; set; }

    public int? CategoryID { get; set; }

    public string Supplier { get; set; }
    public string QuantityPerUnit { get; set; }

    public decimal? UnitPrice { get; set; }

    public short? UnitsInStock { get; set; }

    public short? UnitsOnOrder { get; set; }

    public short? ReorderLevel { get; set; }

    public override string ToString() => ProductName;

}


Enter fullscreen mode Exit fullscreen mode

Then written to another Excel file using the following model.



public class ProductItem
{
    public int ProductID { get; set; }

    public string ProductName { get; set; }

    public string CategoryName { get; set; }

    public decimal? UnitPrice { get; set; }
}


Enter fullscreen mode Exit fullscreen mode

Full and partial worksheets for products

Code first checks if the Excel file exists (as mentioned above), delete the file for a fresh start.

Next, read the worksheet into a list then create a second list with less properties than the first list.

Save the smaller list to a new file.



    /// <summary>
    /// Read products from Products.xlsx as list of <see cref="Products"/> then write to a new
    /// file as <see cref="ProductItem"/> ProductsCopy.xlsx
    /// </summary>
    /// <returns></returns>
    public static async Task ReadProductsCreateCopyWithLessProperties()
    {

        PrintCyan();

        const string excelReadFile = "Products.xlsx";
        const string excelWriteFile = "ProductsCopy.xlsx";

        if (File.Exists(excelWriteFile))
        {
            try
            {
                File.Delete(excelWriteFile);
            }
            catch (Exception ex)
            {
                ex.ColorWithCyanFuchsia();
                return;
            }
        }

        ExcelMapper excel = new();

        var products = (await excel.FetchAsync<Products>(excelReadFile,
            nameof(Products))).ToList();

        var productItems = products.Select(p => new ProductItem
        {
            ProductID = p.ProductID,
            ProductName = p.ProductName,
            CategoryName = p.CategoryName,
            UnitPrice = p.UnitPrice
        }).ToList();

        await new ExcelMapper().SaveAsync("productsCopy.xlsx", productItems, "Products");
    }


Enter fullscreen mode Exit fullscreen mode

Read sheet - update properties and save

This example demonstrates reading a worksheet, making edits and removal of a row. Other than the fact edits are made, this code sample follows the same path as the example above.



/// <summary>
/// Read products from Products.xlsx as list of <see cref="Products"/> then update
/// several products and save to a new file ProductsOut.xlsx
/// </summary>
public static async Task ReadProductsAndUpdate()
{

    PrintCyan();

    const string excelReadFile = "Products.xlsx";
    const string excelWriteFile = "ProductsOut.xlsx";

    if (File.Exists(excelWriteFile))
    {
        try
        {
            File.Delete(excelWriteFile);
        }
        catch (Exception ex)
        {
            ex.ColorWithCyanFuchsia();
            return;
        }
    }

    ExcelMapper excel = new();

    var products = excel.Fetch<Products>(excelReadFile, nameof(Products)).OrderBy(x => x.ProductName).ToList();

    var p1 = products.FirstOrDefault(x => x.ProductName == "CÃ\u00b4te de Blaye");
    if (p1 is not null)
    {
        p1.ProductName = "Cafe de Blave";
    }

    var p2 = products.FirstOrDefault(x => x.Supplier == "Aux joyeux ecclÃ\u00a9siastiques");
    if (p2 is not null)
    {
        p2.Supplier = "Aux Joy";
    }   

    var p3 = products.FirstOrDefault(x => x.ProductID == 48);
    if (p3 is not null)
    {
        products.Remove(p3);
    }

    await excel.SaveAsync(excelWriteFile, products, "Products");

}


Enter fullscreen mode Exit fullscreen mode

Read nested properties

This code sample has person details and their address on the same row and the task is to separate both.

Person and address on the same line

Models



public class Person
{
    public int Id { get; set; }
    public string FirstName { get; set; }
    public string LastName { get; set; }
    public DateOnly BirthDate { get; set; }
    public Address Address { get; set; }
    public override string ToString() => $"{FirstName} {LastName} {Address}";
}

public class Address
{
    public string Street { get; set; }
    public string City { get; set; }
    public string State { get; set; }
    public string ZipCode { get; set; }
    public override string ToString() => State;
}


Enter fullscreen mode Exit fullscreen mode

ExcelMapper seamlessly figures out what to do with the models above.



public static async Task NestedReadPeople()
{
    PrintCyan();

    const string excelFile = "Nested.xlsx";

    ExcelMapper excel = new();

    var contactList =  (await excel.FetchAsync<Person>(excelFile, "Contacts")).ToList();

    AnsiConsole.MarkupLine(ObjectDumper.Dump(contactList)
        .Replace("{Person}", "[cyan]{Person}[/]")
        .Replace("Address:", "[cyan]Address:[/]"));

}


Enter fullscreen mode Exit fullscreen mode

Below is the output using Nuget package ObjectDumper.NET which is great for examining results.

Nested data output

Header not the first row

A developer may need to read a WorkSheet where the header row is not the first row. ExcelMapper can handle this by telling which row is the header row.

This example has the header row at row 10, two nested records.

worksheet with header row as 10

Models to read data.



public class Person
{
    public int Id { get; set; }
    [Column("First Name")]
    public string FirstName { get; set; }
    [Column("Last Name")]
    public string LastName { get; set; }
    [Column("Birth Date")]
    public DateOnly BirthDate { get; set; }
    public Address Address { get; set; }
    public override string ToString() => $"{FirstName} {LastName} {BirthDate}";

}

public class Address
{
    public int Id { get; set; }
    public string Street { get; set; }
    public string City { get; set; }
    public string Zip { get; set; }
    public override string ToString() => $"{Street} {City} {Zip}";

}


Enter fullscreen mode Exit fullscreen mode

In this case to read from row 10, HeaderRowNumber indicates the header row and MinRowNumber (optional) how many rows to read.

ObjectDumper.Dump writes results to Visual Studio Output window.



public async Task HeaderNotAtFirstRow()
{
    ExcelMapper excel = new()
    {
        HeaderRowNumber = 9,
        MinRowNumber = 2
    };

    var people = (await excel.FetchAsync<Person>("ExcelFiles\\Header1.xlsx", "People")).ToList();
    Debug.WriteLine(ObjectDumper.Dump(people));
}


Enter fullscreen mode Exit fullscreen mode

Results



{Person}
  Id: 0
  FirstName: "Karen"
  LastName: "Payne"
  BirthDate: {DateOnly}
    Year: 1956
    Month: 9
    Day: 24
    DayOfWeek: DayOfWeek.Monday
    DayOfYear: 268
    DayNumber: 714315
  Address: {Address}
    Id: 0
    Street: "111 Cherry Lane"
    City: "Salem"
    Zip: "98765"
{Person}
  Id: 0
  FirstName: "Mike"
  LastName: "Smith"
  BirthDate: {DateOnly}
    Year: 1988
    Month: 7
    Day: 3
    DayOfWeek: DayOfWeek.Sunday
    DayOfYear: 185
    DayNumber: 725920
  Address: {Address}
    Id: 0
    Street: "222 Apple Way"
    City: "Portland"
    Zip: "22222"


Enter fullscreen mode Exit fullscreen mode

Reading worksheet modify and save to database

Suppose the task is to read data from a Worksheet followed by filtering out data then save the filtered data to a database? The process is to use Excel Mapper to read a Worksheet, filter or perform validation using a lambda statement or FluidValidation NuGet package for instance then in this case save data to a SQL-Server database table using Microsoft EF Core.

tip to validate data

The code which follows reads customer data and filters out any row where the country is Germany then uses EF Core to save the data.

Note
operations.Reset(); removes rows from the table and resets the primary key.



private static async Task ReadCustomersFromExcelToDatabase()
{
    SpectreConsoleHelpers.PrintCyan();

    try
    {
        DapperOperations operations = new();
        operations.Reset();

        const string excelFile = "Customers.xlsx";

        ExcelMapper excel = new();
        await using var context = new Context();

        var customers = (
                await excel.FetchAsync<Customers>(excelFile, nameof(Customers)))
            .ToList();

        var germanyItems = 
            customers.Where(c => c.Country == "Germany").ToArray();

        foreach (var c in germanyItems)
            customers.Remove(c);

        context.Customers.AddRange(customers);
        var affected = await context.SaveChangesAsync();

        AnsiConsole.MarkupLine(affected > 0 ?
            $"[cyan]Saved[/] [b]{affected}[/] [cyan]records[/]" :
            "[red]Failed[/]");

    }
    catch (Exception ex)
    {
        ex.ColorWithCyanFuchsia();
    }
}


Enter fullscreen mode Exit fullscreen mode

Reading data to model with enum

ExcelMapper handles enumerations.

Example worksheet were WineType is an enum.

Worksheet with enum

Sample code.



using Ganss.Excel;
using SampleApp6.Classes;
using static ObjectDumper;

namespace SampleApp6;

internal partial class Program
{
    static void Main(string[] args)
    {
        AnsiConsole.MarkupLine("[yellow]Working with[/] [cyan]enum[/]");
        var excel = new ExcelMapper("Products.xlsx");
        var products = excel.Fetch<Wines>().ToList();

        AnsiConsole.MarkupLine(Dump(products).Replace("WineType:", "[cyan]WineType:[/]"));

        SpectreConsoleHelpers.ExitPrompt();
    }
}
public class Wines
{
    public int WineId { get; set; }
    public string Name { get; set; }
    public WineType WineType { get; set; }
}
public enum WineType
{
    Red = 1,
    White = 2,
    Rose = 3
}


Enter fullscreen mode Exit fullscreen mode

Results:

Results

Before running provided code

See the following.

Summary

Code has been provided to read tubular data working with several NuGet packages were ExcelMapper performs most of the work along with Microsoft Entity Framework Core for database operations.

See ExcelMapper GitHub page for more e.g.

  • Map to Excel files using header rows (column names) or column indexes (no header row)
  • Optionally skip blank lines when reading
  • Preserve formatting when saving back files
  • Map formulas or formula results depending on property type
  • Use records
  • Provide custom object factories
💖 💪 🙅 🚩
karenpayneoregon
Karen Payne

Posted on May 27, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related