Lessons learned from building a static code analyzer for C#

Introduction

Static code analyzers are tools used to analyze software code without executing it. They can examine the code to find code smells, vulnerabilities, potential errors, and code out of a defined standard, for example. They work by parsing the source code and evaluating its syntax (structure of the code) and semantic (meaning of the code).

Roslyn, the C# compiler, provides tools for developing Roslyn Analyzers (Static code analyzers for Roslyn), giving access to the syntax and semantic of the code, that can be executed at development and build time, providing feedback in near real-time to the developers.

In this post, I'll show a Roslyn Analyzer I built for using Discriminated Unions in C#, requiring to check the union type before access, and talk about some lessons learned while developing it.

DiscriminatedUnions.Net package

The package allows the use of Discriminated Unions by enforcing two rules:

The type of a Union must be checked before access;
All types of a Union must be checked (or have an else/default/discard case).

To use the package, install it in the project:

<PackageReference Include="DiscriminatedUnions.Net" Version="1.0.0.19" />

Extend UnionValue when declaring the types that will be used in the Discriminated Union:

public class Bird: UnionValue
{
}

public class Dog : UnionValue
{
}

Declare the Union member passing the possible value types:

Union<Dog, Bird> animal = new Dog();

And access it checking for the type with If, Switch/Case or Pattern Matching:

if (animal.Value is Dog)
{
    Console.WriteLine($"Dog: {animal.Value}");
}
else
...

Accessing the object without checking will yield an error DUN002 - 'animal.Value' not checked before access:

Not checking for all possible types will yield an error DUN001 - 'animal' not being evaluated for all possible types:

⚠️ This package was made for study purposes only. Feel free to test it, but won't use it for production code as it won't be maintained and may not cover all edge cases.

DiscriminatedUnions.NET Source Code

https://github.com/dgenezini/DiscriminatedUnions.NET

Lessons learned

Development-time only nuget packages

Roslyn analyzers can be distributed as nuget packages, but it is important to generate the package as a Development Dependency only, setting the property DevelopmentDependency as true on the csproj file:

<DevelopmentDependency>true</DevelopmentDependency>

This will generate two tags in the nuget package properties of the consumer projects:

<PackageReference Include="DiscriminatedUnions.Net.Analyzers" Version="1.0.0.1">
    <PrivateAssets>all</PrivateAssets>
    <IncludeAssets>runtime; build; native; contentfiles; analyzers; buildtransitive</IncludeAssets>
</PackageReference>

PrivateAssets with the value all, indicating that this package won't flow to projects depending on the consumer project. For example, in a scenario where Project1 depends on the DiscriminatedUnions.Net.Analyzers package and Project2 depends on Project1, the analyzer won't be available for Project2;
IncludeAssets without the compile value in its value, indicating the consumer of the package won't have access to its compiled assemblies.

Roslyn Analyzers can be used to enforce rules on how to use a package

A great use case for Roslyn Analyzers is to enforce rules on nuget packages.

The DistributedUnion.Net package is one example. It has the Union and UnionValue types and analyzers to enforce the rules on its use.

Because classes inside Analyzers packages are not accessible by consumers (and shouldn't be), the most correct way, in my opinion, is to have the public classes in one package and this package will consume another package with the analyzers, but removing the PrivateAssets configuration. This way, the analyzers will be available for consumers of the parent package:

<PackageReference Include="DiscriminatedUnions.Net.Analyzers" Version="0.1.0.0-beta">
    <IncludeAssets>runtime; build; native; contentfiles; analyzers; buildtransitive</IncludeAssets>
</PackageReference>

Analyzers have to evolve together with the language

Because analyzers use the code syntax, as new ways of doing something are added to the language, analyzers have to be updated to treat them also. For example:

An analyzer checking for conditions before pattern matching was release, would need to be updated to consider this new syntax;
An analyzer using types for its rules, would need to be updated to consider nullable reference types in its logic.

TDD is your friend

The Roslyn Analyzers project template comes with a VSIX project that can be run to test the analyzers, but I couldn't get them to work and decided it wasn't worth the time.

Turns out the best and easiest way to run and experiment with the analyzer is by running automated tests. Create the test, run the test and change the analyzer until the test passes.

Example of an automated test that expects the error code DUN001 at the location 0 with the argument unionBird:

[TestMethod]
        public async Task Notify_If_NotCheckingAllCases_Many()
        {
            var test = @"
using DiscriminatedUnionsNet;

class Program
{
    static string GetValue()
    {
        Union<Duck, Goose, Eagle, Crow> unionBird = new Duck();

        if ({|#0:unionBird|}.Value is Goose)
        {
            return ""Goose"";
        }

        return null;
    }
}" + defineUnion;

            var expected = VerifyCS
                .Diagnostic("DUN001")
                .WithLocation(0)
                .WithArguments("unionBird");

            await VerifyCS.VerifyAnalyzerAsync(test, expected);
        }

The location and argument are marked with the {|#location:argument|} syntax.

ℹ️ The project template comes with some basic tests as examples that are easy to change.

Casts... Casts everywhere

The analyzers methods and properties define general interface types that need to be cast to the specific type before accessing its properties.

This is an excerpt from the DiscriminatedUnions.Net package:

var namedType = (INamedTypeSymbol)typeInfo.Type; //typeInfo.Type is ITypeSymbol

if ((!namedType.Name.Equals("UnionValue")) ||
    (!namedType.ContainingNamespace.Name.Equals("DiscriminatedUnionsNet")))
{
    return;
}

...

if (parent is ISwitchOperation switchOperation) //parent is IOperation
{
    if (!(switchOperation.Value is IPropertyReferenceOperation propertyReferenceOperation) || //switchOperation.Value is IOperation
        !(propertyReferenceOperation.Instance is ILocalReferenceOperation parentReferenceOperation))
    {
        parent = parent.Parent;

        continue;
    }
    ...
}
else if (parent is ISwitchExpressionOperation switchExpressionOperation) //parent is IOperation
{
    ...
}

Understanding the syntax tree and finding the correct interfaces

The easiest way to understand the syntax tree is by using the Visual Studio's Syntax Visualizer (Installation instructions).

Just click anywhere in the code and it will show the syntax tree up to that point:

Another tip is to type ISymbol, IOperation, or any base interface to see all the specific interfaces in the intellisense:

Not many material to help beginners

Because Roslyn Analyzers have very specific use cases, there is not a lot of materials and documentation online. Here are some links that helped me learn (specially Josh Varty's and Meziantou's Blog):

Josh Varty's Blog - Learn Roslyn Now series

Meziantou's Blog - Writing a Roslyn analyzer series

Roslyn docs - How to start

Roslyn GitHub docs - How to write a C# Analyzer and Code Fix

Roslyn GitHub docs - Analyzer Actions Semantics