Real World F# Interop

For various reasons my past few years in development have involved a lot of language bindings, both reading up on others techniques and my own experimentation. I've written before about writing advanced bindings, and designing around bindings. However, these articles always used trivial examples. I've never talked before about the bindings I've written, why I used the techniques I did, and why I didn't use other approaches.

Today, that changes.

I have several libraries I maintain. For various reasons, several of these have been greatly simplified recently. The general idea is that each serves some domain, some with large amounts of custom methods and types, and others as simply large collections of extensions. Critically, some of these depend on another, and share method names. Because of this, use of a library that depends on another should extend the bindings. This can be a surprising challenge.

Numbersome

This is just a small convenience library meant to provide basic functionality more easily, and patch up some areas I needed to use often that weren't covered by larger math libraries. It also shows off something I couldn't until recently. Prior to this last .NET release (6.0) generic math was a problem to write, and was typically easier to write in F# and make C# bindings for instead. That's changed. Consider:

public static T Crd<T>(this T value) where T : IFloatingPoint<T> {
    T two = T.One + T.One;
    return two * T.Sin(value / two);
}

A simple Chord function. You don't see it much anymore, although it is useful at times, and often isn't used enough to remember one of the identities, so it's a suitable candidate. Luckily, this is easy, as we can write a very direct mapping.

let inline crd< ^t when ^t :> IFloatingPoint< ^t>> (value:^t) =
    value.Crd()

This is possible without more complex techniques precisely because there's only one method that this ever needs to bind to, and that method is sufficiently generic. The compiler doesn't need to be tricked; it just needs to be told what ^t is. An in doing so, this could be considered exactly identical to the C# signature. There's a method, a generic type constrained to a CRTP interface, a single parameter of that generic type, and a return of that type. If you have a single generic method that needs to be bound to, this is the least effort. Easy enough.

It won't be very obvious yet, but eventually it would become clear how operator overloading will become an issue. "Would" might have been overlooked, but I do mean past tense. There's numerous ways to address this, an in the past some of them were terrible. It's not that way anymore. Numbersome includes many definitions that work exactly like the standard F# operators and look like this:

let inline ( + ) (left:^a) (right:^b):^c =
    ((^a or ^b or ^c) :
     (static member op_Addition : ^a -> ^b -> ^c)(left, right))

This winds up only being a slight bit of complexity over the aforementioned binding. What we're saying here can be broken up into several parts. (^a or ^b or ^c) says the member constraint we're looking for can be found on any of the types: lefthand, righthand, or return. This forms a critical part of covering both preexisting types, and extending seamlessly to new types. (static member op_Addition tells the compiler we're binding to a static member, which all operators are, called op_Addition, which is the special name the compiled IL uses, that CLS Compliant languages understand as the + operator. ^a -> ^b -> ^c is, of course, the signature of the operator. By using different type definitions we can potentially have unique types in different parameters, but, of course, can all be unified to the same type like the standard F# operators. This yields more flexibility than the standard F# definition, which, yay.

Collectathon

Collectathon used to be a data structures library. Because of increasing demands on my personal life, I've decided to squash this down to the bare minimum required to get what I needed out of a good DS API. I've reworked things into a broad set of extension methods for the existing data structures. It's not quite the design I wanted, and imposes some trickier issues with method binding.

Consider Contains() which exists only for Span<T> and ReadOnlySpan<T>. Arguments can definitely be made about Memory<T>, how it should be conceptualized, what it should support, but arrays? Really? No one thought to include a method to check arrays for a value? Sure, the loop isn't hard to write, except that more efficient checks which use partial loop unrolling or vectorization are actually non-trivial, and clutter code. Implementing this was simple enough:

public static Boolean Contains<T>(this T[]? collection, T element)
    where T : IEquatable<T> =>
    MemoryExtensions.Contains(collection.AsSpan(), element);

But now we have a slight problem. This, and others, are in a different static class as the ones for Span<T> and ReadOnlySpan<T>. I wrote before about binding to multiple overloads when they're all in a single static class, so I won't cover that again. But as for multiple? It's doable, but it's a closed extension.

let inline Contains< ^t, ^u, ^a, ^b when (^t or ^u or ^a or ^b) :
    (static member Contains : ^a -> ^b -> Boolean)>
    collection element =
    ((^t or ^u or ^a or ^b) :
    (static member Contains : ^a -> ^b -> Boolean)
    (collection, element))

The only thing that's different about the overloading part is the additional ^u. The function signature looks like this:

let inline contains (collection:^a) (element:^b) =
    Contains<CollectathonExtensions, MemoryExtensions, ^a, ^b>
    collection element

This winds up working better than F# side binder types in this scenario. As long as new extensions are added to either of the declared static types it will automatically be available on the F# side as well. A method of the same name and compatible signature in another static class in a scoped namespace? That works in C# but not F#. At least not with this technique. I am looking for an open extension mechanism here, but for now, only closed sets.

Interestingly, this closed set technique does have quite a limitation when you take this further. Consider extending types in both System.Collections.Generic and System.Collections.Immutable. Say we add a Requeue method for both respective Queue types. This isn't an easy problem to solve, and one or the other will collide and cause problems. Of course, standard extension method calls are viable.

Stringier

A lot of this library is, from a binding standpoint, conceptually identical to the methods in Collectathon, only it additionally binds to its own overloads as well. This just means an extra static class parameter in each of the bindings, so this doesn't need to be covered.

However, those of you who know my work know this isn't all that needs to be addressed. And let me tell you, the work I've been doing on optimizations here caused some problems that weren't easy to address.

type pattern = Pattern

Please do this. It's self explainatory, but it does make things more pleasant on the F# side where types are often lowercase.

Pattern has many predefined patterns, especially for UNICODE Categories or Contributory Properties. Luckily, these are also very straightforward.

let letter = Pattern.Letter

Similarly, there's some factory functions defined there as well. These should not be mapped directly in my case, because of the overloading of textual types.

let inline blockComment (start:^a) (stop:^b) =
    BlockComment<Pattern, ^a, ^b> start stop

The overloading part looks exactly as you'd expect.

Now, while there are techniques for overloading in F#, it's not as advanced as you might expect, and different parameter counts must have different functions for them. This creates the following two functions:

let inline fuzzy (pattern:^a):pattern =
    Fuzzy<Pattern, ^a> pattern

let inline fuzzy2 (maxEdits:int) (pattern:^a):pattern =
    Fuzzy2<Pattern, ^a> pattern maxEdits

I want to bring up an important detail here, that even if I've brought up before, I want to hammer home. What we would call the "caller" in OOP is a (typically) implicit first parameter. However, in FP this parameter should instead be the last parameter, because when using function piping, the piped values are placed last. 2 as a suffix is a well understood convention, so downstream will generally understand it. Preferably however, use a clarifying suffix.

Pattern construction is where things start to go south.

let inline many (pattern:^a):pattern =
    Many<StringierPatternsExtensions, ^a> pattern

The modifiers are straight forwards, since (now) all of them are extension methods. This actually has an incredibly nice effect in that, unless the pattern is a single literal, every single part will implicitly convert. So what about the single literals? Pattern has constructors, but .NET doesn't like constraining on parameterized constructors. Well, F# doesn't care too much.

let inline ptn (pattern:^a):pattern =
    (^t : (new : ^a -> Pattern) pattern)

Yup. You can use this pretty easily inside of functions that need more complex logic. Essentially, this is the constructor, as long as it's only one parameter.

I used to call this p, taking after the FParsec pchar and pstring conventions, but I've decided more recently that's too vague. ptn seems to strike a good ballance between succinct and clear.

Now we get into problems. "Or", or alternates, is something a lot of parser engines struggle with how to create syntax for. FParsec went with <|>, it's own operator. Most just assume you're going to use the C# API. You can use || or ||| however, without creating issues, as long as you put it together with a few things we've covered earlier.

let inline ( || ) (left) (right) =
    Or<StringierPatternsExtensions, NumbersomeExtensions, _, _, _>
    left right

Yeah, I hid something from you earlier 😜 We still have the closed extension problem from earlier, but we can add behavior to an operator. Just know that if anyone else does this you start to have collisions. I'm including both of these because it depends on the symbol you overload.

public static Boolean Or(this Boolean left, Boolean right) =>
    left || right;

public static T Or<T>(this T left, T right)
    where T : IBitwiseOperators<T, T, T> =>
    left | right;

This allows the F# operator to still behave as expected, while also allowing a familiar "or" operator for pattern construction. I suppose you'd call me antagonistic towards idiomatic F#, and I suppose you'd be correct.

Now is where it all falls apart. Yeah. For many parts of this we're into undocumented and unintended abuse of language features to make this stuff work in extensible and low effort ways. But there be dragons.

let parse (source:^a) (location:byref<int>) (pattern:pattern):^b option

This is the signature I want. Unlike typical bindings, this is almost idiomatic F#. If I were to wrap up source and location into a record or something, it could be entirely idiomatic F#. Since the signature is so different, you run into issues, but I really do want to make this work. What I do know, is that a technique I've never covered here is useful. Sometimes you might not want to bind to the public API, but rather, recompose internal APIs into the same effect. Enter: InternalsVisibleToAttribute. When this is applied, it allows other assemblies to still access internal members.

let inline parse (source:^a) (location:byref<int>)
(pattern:pattern):^b option =
    let start:int = location
    match pattern.Head.Consume(
    source, &location, Unchecked.defaultof<StringComparison>) with
    | null -> Some(source[start..location])
    | _ -> None

This almost works. Almost. F# wants to resolve ^a to ReadOnlySpan<Char>. I think I can write an overloader for Consume and have this work, although the additional slicing has me unsure. If I manually declare ^a and ^b as String, it does work.

Alternatively, one of the F# side binder types may do the trick here.

Closing

I have, however, run out of time. I work 50 hours a week, and in school, and do this development on the side. As of yesterday (Dec 24th), this is as far as I've gotten in my own code.

Last time I had as a footnote the possibility of Haskell style Variadic Functions. As far as I can tell this "feature" isn't possible anymore. That's probably a good thing πŸ˜‚ But it would have been a fun tool in the toolbox.

I do have a few things to track down. While playing with things, I think I found a mechanism by which it's possible to have a F# function overloaded on both instance and static members. This would have huge implications for certain areas, like where some types have instance methods but others only extension methods for the same features.

17