Traversing the C# Syntax Tree with F#

This article will go over the basics of the .NET compiler platform feature for analyzing the C# syntax tree, using F#.

.NET provides a Syntax API that can read any C# or Visual Basic source file and provide the corresponding Syntax Tree for that code.

Why

Why would someone need to traverse the C# syntax tree?

Well, it can be for a number of reasons, maybe you want to gather statistics about how many classes, namespaces and methods you have, maybe you want generate code based on what is already written, maybe you want to create new tools like a new linter or a tool like Swagger. All these things can be done by analyzing the syntax tree.

Recently I found myself using the Syntax API for finding Attributes above certain methods and classes, and based on the name and arguments of the Attributes, I generated various other files that were used elsewhere.

using System.Collections;
using System.Linq;
using System.Text;

namespace FunWithSyntaxTrees
{
    class Program
    {
        static void Main(string[] args)
        {
            // ...
        }
    }
}

The snippet above shows a small program. We will use this snippet as our input for analyzing the syntax tree.

How

Assuming you have an F# environment setup. You can begin by installing the nuget package Microsoft.CodeAnalysis.CSharp and importing that into your project.

open Microsoft.CodeAnalysis
open Microsoft.CodeAnalysis.CSharp
open Microsoft.CodeAnalysis.CSharp.Syntax

module Main =

  [<EntryPoint>]
  let main argv =
    0

After you install the package and add your open directives, we will hardcode the C# source code from above into the file, above the main entrypoint function.

// ... open directives

let code = """
using System.Collections;
using System.Linq;
using System.Text;

namespace FunWithSyntaxTrees
{
    class Program
    {
        static void Main(string[] args)
        {
            // ...
        }
    }
}
"""

module Main =

  [<EntryPoint>]
  let main argv =
    0

When you write a real program that uses the Syntax API, you will most likely be reading the C# source from files, like this let code = File.ReadAllText "/path/to/file", instead of hardcoding the string like we did, but for this tutorial it is fine for demonstration.

So we will begin by passing the string of C# source code to the Syntax API to be parsed, in return we will get the Syntax Tree that we can begin analyzing.

[<EntryPoint>]
let main argv =
  let syntaxTree: SyntaxTree = CSharpSyntaxTree.ParseText code

  0

Note: I will write out the Type's of all the variables, but it is unnecessary most of the time since F#'s type inference is very capable of inferring the type itself. Just like in C# when you use the var keyword, it is capable of knowing the underlying type, in F# this inference is even more powerful and applies to arguments, functions and everything in-between.

Now that the Syntax API has returned our needed Syntax Tree, we can begin travering it and exploring what it offers as data.

First let us get all the using directives in the file. We start by getting the root node of the file, then we iterate over all the child nodes inside the root node and find the ones that are the correct UsingDirective type.

[<EntryPoint>]
let main argv =
  let syntaxTree: SyntaxTree = CSharpSyntaxTree.ParseText code

  let rootNode: CompilationUnitSyntax = syntaxTree.GetCompilationUnitRoot()
  let rootNodeChildren: SyntaxNode seq = rootNode.ChildNodes()

  0

The rootNodeChildren variable holds all the child SyntaxNode's of the root node. The root node is basically the first node of the SyntaxTree which holds everything, and a SyntaxNode is the most general type of node.

We now need to iterate over these children to find the correct SyntaxNode for using directives since that is what we are looking for. We will declare a small helper function to help find them.

let usingDirectiveNode (node: SyntaxNode): UsingDirectiveSyntax option =
  match node with
  | :? UsingDirectiveSyntax as usingDirective -> Some usingDirective
  | _ -> None

[<EntryPoint>]
let main argv =
  let syntaxTree: SyntaxTree = CSharpSyntaxTree.ParseText code
  let rootNode: CompilationUnitSyntax = syntaxTree.GetCompilationUnitRoot()
  let rootNodeChildren: SyntaxNode seq = rootNode.ChildNodes()

  let usingDirectives: UsingDirectiveSyntax seq =
    Seq.choose usingDirectiveNode rootNodeChildren

  0

The new helper function usingDirectiveNode takes a generic SyntaxNode and checks if it is of the UsingDirectiveSyntax variety, if it is, it returns an F# Option type containing the using directive node.

Note: An F# Option type is a way to represent a "nullable" value, since there are no real null values in F#, nullable values are representated as Algebraic Data Types, such as the Option type.

We use the new helper function by mapping over every node and passing it to the function. We use Seq.choose to filter out any None types and keep all the Some types. It also unwraps the Some types so we can keep using them without Option mapping.

So Seq.choose is just a fancy way of doing Seq.map and then Seq.filter specifically with Option types since the type signature is ('T -> 'U option) -> seq<'T> -> seq<'U>.

Moving along, so now that we have a sequence of using directives in a variable, we can get the specific properties of a using directive. For now we wil just print them out as proof.

let usingDirectiveNode (node: SyntaxNode): UsingDirectiveSyntax option =
  match node with
  | :? UsingDirectiveSyntax as usingDirective -> Some usingDirective
  | _ -> None

[<EntryPoint>]
let main argv =
  let syntaxTree: SyntaxTree = CSharpSyntaxTree.ParseText code
  let rootNode: CompilationUnitSyntax = syntaxTree.GetCompilationUnitRoot()
  let rootNodeChildren: SyntaxNode seq = rootNode.ChildNodes()
  let usingDirectives: UsingDirectiveSyntax seq =
    Seq.choose usingDirectiveNode rootNodeChildren

  usingDirectives
    |> List.ofSeq
    |> List.map (fun u -> printfn $"{u.ToString()}")
    |> ignore

  0

The output of running our program would be:

using System.Collections;
using System.Linq;
using System.Text;

Pretty cool right? We analyzed our C# code and found our using directives and printed them out.

We can use that strategy to find anything in our code, including methods, method arguments, types, classes, interfaces, enums, comments, attributes, etc, everything!

If you found this useful, feel free to follow me on twitter at @rametta

38