Elixir and NIF: a study case

Update

In this post I quote the QA Engineer, but I hadn't talked to her about put her name here. Since this post reached out a lot of people (one thing that I never ever thought it could happen), I did talk to her and she agreed to put her name here.

So, thank you Natália Nuñez. This article would never happen without you =).

Hello devs! I'm Rodrigo Caldeira, software engineer at SumUp in São Paulo, and I'll share some thoughts about Elixir and NIF with a real story. This is my first contribution here, so please send me feedbacks about it!

A little bit of context

I'm working at SumUp since Jan 2020 and, after some reshuffles, I'm now in the Bank business unit working directly with PIX features.

We do have here, along with others events and meetings, two events that the EP&D (Engineering, Product and Design) area attend on:

Lunch and Learn: in some places this is know as brown bag, but the goal here is to have the opportunity to share knowledge in the lunch time at every fridays, where anyone in EP&D can present, about any subject

HackDay: an entire friday, every two weeks, reserved to work with any non SumUp directly related project. We can do work with literally anything here.

With that said, this post starts with these events.

During one HackDay, I was studing a way to do Slack integrations (without success =( ), when suddenly Natália Nuñez, our QA Engineer here in the bank, called me.

  • Nati: Hi, Caldeira! How are you? Could you help me with my HackDay project, please? I was trying to do something here, but now I'm stuck =/
  • Me: Sure! How can I help you? (at this point I had already given up on my project)
  • Nati: Awesome! So, I'm trying to use a software to automate some tests here, but this software doesn't have an Elixir plugin (we use Elixir here in the Bank). Looking in the docs I found out that it is possible to create a plugin with their C library. Do you know how to do that?
  • Me: Whoa! That's a tough one! I never did that, but I know that's possible with NIF. Let's create a simple project to study a little bit, and then we come back with that library. What do you think?
  • Nati: Great! Let's do it!

Spoiler: we didn't came back with that library

So, that's the whole scenario here. The QA Engineer and I started a study case about Elixir and NIF in that HackDay. The outcome of that HackDay was a Lunch and Learn that I presented two weeks after, and that's I'm bringing to you right now.

The case

Elixir systems runs over the Erlang BEAM virtual machine, and NIFs (Native Implemented Functions) are the way to extend Erlang software through loading and executing native pieces of software. Here those software can be written in any language that compiles to native components, like Rust or C, and in this example I'll create and use a library in C and use it in a simple Elixir module.

So, let's start with our native library. It's a simple calculator that exposes four functions that receives two integer parameters, and returns one integer:

lib_calc.h

int somar(int a, int b);
int subtrair(int a, int b);
int multiplicar(int a, int b);
int dividir(int a, int b);

lib_calc.c

#include "lib_calc.h"

int somar(int a, int b) {
  return a + b;
}

int subtrair(int a, int b) {
  return a - b;
}

int multiplicar(int a, int b) {
  return a * b;
}

int dividir(int a, int b) {
  return a / b;
}

For non portuguese speakers, my guess is that the functions' names should be straightforward, but here is the translation:

  • somar -> sum
  • subtrair -> subtract
  • multiplicar -> multiply
  • dividir -> divide

Great! Now that we have our library defined, let's compile it:

$ gcc -o lib_calc.so -c lib_calc.c

No errors, no warnings. click noice!
BTW, I'm using Ubuntu on WSL2 for our example, but you should not face any problems with other distros.

Now it's time to test our library. To do that, I'll create another C program that will receive three parameters:

  • An integer number
  • The operator
  • An integer number

calc.c

#include<stdio.h>
#include "lib_calc.h"
#include<stdlib.h>

int main(int argc, char ** argv) {
  int a, b;
  char option;

  a = strtol(argv[1], (char **)NULL, 10);
  b = strtol(argv[3], (char **)NULL, 10);

  switch (argv[2][0]) {
    case '+': printf("%d + %d = %d\n", a, b, somar(a, b)); break;
    case '-': printf("%d - %d = %d\n", a, b, subtrair(a, b)); break;
    case '*': printf("%d * %d = %d\n", a, b, multiplicar(a, b)); break;
    case '/': printf("%d / %d = %d\n", a, b, dividir(a, b)); break;
    default: printf("Invalid option\n");
  }

  return 0;
}

And compiling it

$ gcc -o calc calc.c lib_calc.so

Once again, no errors.

Now, let's run our calculator

$ ./calc 1 + 1
1 + 1 = 2

Awesome! It works!
Now, notice that in our calculator program I didn't do any kind of checking about the parameters sent to it. So, if we run it with an unexpected input, this is the result

$ ./calc 1
[1]    388 segmentation fault  ./calc 1

That's OK for us, for we are not interested in the C calculator.

So, with all that ready, how can we use our library inside an Elixir program?

The solution

To achieve that, we need to write another C program that will represent our NIF to Erlang and expose our calculator library to our Elixir module.

lib_calc_nif.c

#include <erl_nif.h>
#include "lib_calc.h"

static ERL_NIF_TERM somar_nif(ErlNifEnv* env, int argc, const ERL_NIF_TERM argv[])
{
  int a, b, result;
  enif_get_int(env, argv[0], &a);
  enif_get_int(env, argv[1], &b);
  result = somar(a, b);
  return enif_make_int(env, result);
}

static ERL_NIF_TERM subtrair_nif(ErlNifEnv* env, int argc, const ERL_NIF_TERM argv[])
{
  int a, b, result;
  enif_get_int(env, argv[0], &a);
  enif_get_int(env, argv[1], &b);
  result = subtrair(a, b);
  return enif_make_int(env, result);
}

static ERL_NIF_TERM multiplicar_nif(ErlNifEnv* env, int argc, const ERL_NIF_TERM argv[])
{
  int a, b, result;
  enif_get_int(env, argv[0], &a);
  enif_get_int(env, argv[1], &b);
  result = multiplicar(a, b);
  return enif_make_int(env, result);
}

static ERL_NIF_TERM dividir_nif(ErlNifEnv* env, int argc, const ERL_NIF_TERM argv[])
{
  int a, b, result;
  enif_get_int(env, argv[0], &a);
  enif_get_int(env, argv[1], &b);
  result = dividir(a, b);
  return enif_make_int(env, result);
}

static ErlNifFunc nif_funcs[] = {
  {"somar", 2, somar_nif},
  {"subtrair", 2, subtrair_nif},
  {"multiplicar", 2, multiplicar_nif},
  {"dividir", 2, dividir_nif},
};

ERL_NIF_INIT(Elixir.Calc, nif_funcs, NULL, NULL, NULL, NULL)

Holly molly! That's a lot of code for a simple library! Let's dig in.

The first line

#include <erl_nif.h>

is the baseline of our NIF. It is the header of all basic NIF libraries with the functions and macros needed to create the NIF. It is located in the erlang-dev package.

After that we have four more functions, each one representing the functions in our C library in a vertical point of view, so I will focus in only one of them, as the rest is basically the same.

static ERL_NIF_TERM somar_nif(ErlNifEnv* env, int argc, const ERL_NIF_TERM argv[])
{
  int a, b, result;
  enif_get_int(env, argv[0], &a);
  enif_get_int(env, argv[1], &b);
  result = somar(a, b);
  return enif_make_int(env, result);
}

These lines declare a new static function called somar_nif that returns an ERL_NIF_TERM (a type that represent any Erlang term), and expects three arguments:

  • ErlNifEnv* env is a pointer that represents an environment that can host Erlang terms. Let's consider it as the environment that is running our NIF
  • int argc contains the number of arguments that was passed to the function
  • const ERL_NIF_TERM argv[] are the arguments passed to the function

This resembles a lot like a regular main function in any C program

int main(int argc, char ** argv)

when you have to read argv to get the values passed to your function, based on the number or arguments contained in argc.

And that's exactly what is happening inside the function

// Our C variables
int a, b, result;

// Reads the first value and stores it in a
enif_get_int(env, argv[0], &a);

// Reads the second value and stores it in b
enif_get_int(env, argv[1], &b);

// Our lib_calc function been called!
result = somar(a, b);

// Transforms the result into an ERL_NIF_TERM and returns it
return enif_make_int(env, result);

Here the argc is being totally ignored, as we already known that exactly 2 values are being passed as arguments.

After defining all our NIF functions, we have to inform the Erlang NIF API how to call them.

static ErlNifFunc nif_funcs[] = {
  {"somar", 2, somar_nif},
  {"subtrair", 2, subtrair_nif},
  {"multiplicar", 2, multiplicar_nif},
  {"dividir", 2, dividir_nif},
};

ERL_NIF_INIT(Elixir.Calc, nif_funcs, NULL, NULL, NULL, NULL)

The static ErlNifFunc nif_funcs[] is an array of ErlNifFunc struct. This struct is defined as having the following variables

  • name: The NIF function's name, that will be exposed in our NIF
  • arity: The NIF function's arity
  • function: The pointer to the function that will be called when the Erlang/Elixir module calls the NIF

There is a fourth variable in ErlNifFunc struct that is the flags, but for our example it can be ommited.

The last piece of code in our NIF is the ERL_NIF_INIT macro call, passing the module name, the functions that will be exposed in our NIF, and pointer to functions dedicated to treat load, reload, upgrade and unload events (ignored here in our example).

Notice that the module name is Elixir.Calc, and not just Calc. That's necessary because our goal is to use this NIF in an Elixir module, and all Elixir modules from the Erlang perspective starts with Elixir..

Phew! A lot of work here! Let's compile it and see what happen.

$ gcc -shared -o lib_calc_nif.so -fPIC lib_calc_nif.c lib_calc.so

Great! Again, no errors or warnings.

Notice the -fPIC flag passed to gcc. This is to inform gcc to create a Position Independent Code, which will generate an assembly code with relative addresses references.

And now, the moment of truth! Let's create an Elixir module!

calc.ex

defmodule Calc do
  @on_load :load_nifs

  def load_nifs do
    :erlang.load_nif('./lib_calc_nif', 0)
  end

  def somar(_a, _b) do
    raise "NIF somar not implemented"
  end

  def subtrair(_a, _b) do
    raise "NIF subtrair not implemented"
  end

  def multiplicar(_a, _b) do
    raise "NIF multiplicar not implemented"
  end

  def dividir(_a, _b) do
    raise "NIF dividir not implemented"
  end
end

This, dear devs, is our Elixir module that will call our NIF! Taking a look you will notice this

@on_load :load_nifs

def load_nifs do
  :erlang.load_nif('./lib_calc_nif', 0)
end

This defines a callback that will be executed when the module is loaded (@on_load :load_nifs), and the callback will load our NIF (:erlang.load_nif('./lib_calc_nif', 0)). Let's see it in action!

$ iex
Erlang/OTP 22 [erts-10.6.4] [source] [64-bit] [smp:12:12] [ds:12:12:10] [async-threads:1]

Interactive Elixir (1.12.0) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> c("Calc.ex")
[Calc]
iex(2)> Calc.somar(1, 2)
3
iex(3)>

Hooray! It works!

Now, notice all the other functions defined in our module. They're just fallback functions, just in case the NIF was not loaded for whatever reason.

That's really great! But...

Not so fast! There are a lot of things to be considered here before we move on with our excitement.

First, remember how the arguments are declared in our C library?
lib_calc.h

int somar(int a, int b);

This function expects integer values. This behavior was passed all along with our journey here.

In fact, our NIF tries to convert the first argument to an integer
lib_calc_nif.c

enif_get_int(env, argv[0], &a);

What happens if we try to pass a value from another type?
Like a string, for example:

iex(3)> Calc.somar("test", 1)
32753

What?

Let's try with a float

iex(7)> Calc.somar(1.0,1)
32753
iex(8)> Calc.somar(1,1.0)
1251267553
iex(9)>

oO???
What is happening here?!

Worse than that, if we keep calling the function with exactly the same value

iex(8)> Calc.somar(1,1.0)
1251267553
iex(9)> Calc.somar(1,1.0)
1251267553
iex(10)> Calc.somar(1,1.0)
1251531297
iex(11)> Calc.somar(1,1.0)
1251531297
iex(12)> Calc.somar(1,1.0)
1251288249
iex(13)>

The return value changes!

This is because we didn't any check inside our NIF to see whether the conversion was successfully done or not. So basically here we are getting junk values from the conversion.

As we are just trying to convert and sum (subtract, multiply or divide) and ignoring the variable it self. So this can do no harm to our module. But remember, we are dealing with C here. Not Elixir, not Erlang. C. Pointers, memory... Can you imagine the scenario?

Besides that, what if we try to divide by zero?

iex(13)> Calc.dividir(1,0)
[1]    501 floating point exception  iex

The BEAM crashed! That's a huge problem with using NIFs: if the NIF crashes during the execution, the entire BEAM crashes.

So this is a feature that must be used very, very, very carefully.

Final considerations

I really love to study these kind of topic. Understand how the things works under the hood is a powerful method to discover new possibilities, explore and extend my knowledge about a subject.

But that's it. A study case.

I don't intent to put any of these things in production, unless it's totally necessary:

  • If there is no time to develop an entire feature that is already implemented in a native library
  • If there is a bug in Erlang modules that prevents you to deploy your feature
  • If that native lib is so exclusive, so unique, and solves a huge problem, and there is no alternative to it

I think you got the point here.

The source code of this project can be found at https://github.com/rodrigocaldeira/nif_cgo, and there is a bonus there for the Go devs!

Thank you so much!

21