GraphQL Observability with Sentry

At Fieldguide, Hasura exposes a GraphQL API on Postgres, extended with custom types implemented in a Node.js application's Apollo Server. Our front-end React application interacts with Hasura via Apollo Client, and our applications are managed on Heroku. GraphQL's inherent self-documentation has fueled an ecosystem of developer tooling, and its use with TypeScript results in highly efficient internal API development.

While iteration speed is certainly a key product development metric, understanding the behavior of features is equally important. This complementary information confirms development assumptions and surfaces inevitable bugs, providing a feedback loop that informs future iteration. Application behavior can be observed by generating proper telemetry data such as metrics, logs, and traces.

We adopted Sentry, an error tracking and performance monitoring platform, in the beginning weeks of our product's inception. We have iterated on the integration over the past year, improving our ability to diagnose performance (traces) and triage errors (a subset of logs). This Sentry integration overview is derived from our specific Node.js GraphQL server and React GraphQL client, but the takeaways can be applied to any system with GraphQL interactions.

GraphQL Server

Sentry provides informative guides for many platforms. In our server's case, we apply Apollo Server v2 as an Express middleware; therefore, Sentry's Express Guide with request, tracing, and error handlers is a great starting point.

As part of initialization, we configure tracesSampleRate such that a sampling of traces count towards our quota. Additionally, we bind a git commit hash (exposed via Heroku's Dyno Metadata feature) to the release version, enabling Sentry to monitor release health.

Sentry's Express-compatible tracing handler starts a transaction for every incoming request with a name derived from the HTTP method and path. This works well for REST APIs, but GraphQL entities are not identified by URLs, and by default all GraphQL requests will be identified by POST /graphql. To achieve proper specificity, we instantiate Apollo Server with a custom plugin that qualifies transaction names with the contextual GraphQL operation when Apollo receives a request.

Apollo Server plugin responding to the requestDidStart event
import * as Sentry from "@sentry/node";
import { ApolloServerPlugin } from "apollo-server-plugin-base";

export const sentryPlugin: ApolloServerPlugin = {
    requestDidStart({ request }) {
        if (request.operationName) {
            const scope = Sentry.getCurrentHub().getScope();
            const transaction = scope?.getTransaction(); // retrieve ongoing transaction

            if (transaction) {
                // qualify transaction name
                // i.e. "POST /graphql" -> "POST /graphql: MyOperation"
                scope?.setTransactionName(
                    `${transaction.name}: ${request.operationName}`
                );
            }
        }
    },
};
Enter fullscreen mode Exit fullscreen mode

Similarly, GraphQL errors differ from conventional REST APIs. Exceptions thrown while executing a GraphQL operation are represented as an errors response body field and will not inherently be captured by Sentry's Express-compatible error handler. We report these errors with an identified user and context by extending our Apollo Server plugin as described in this Sentry blog.

Extended Apollo Server plugin responding to the didEncounterErrors event
import * as Sentry from "@sentry/node";
import { ApolloError } from "apollo-server-express";
import { ApolloServerPlugin } from "apollo-server-plugin-base";

export const sentryPlugin: ApolloServerPlugin = {
    requestDidStart({ request }) {
        if (request.operationName) {
            // qualify transaction name
            // ...
        }

        return {
            didEncounterErrors(ctx) {
                if (!ctx.operation) {
                    return; // ignore unparsed operations
                }

                Sentry.withScope((scope) => {
                    if (ctx.context.currentUser) {
                        scope.setUser({
                            id: String(ctx.context.currentUser.id),
                            // ...
                        });
                    }

                    for (const error of ctx.errors) {
                        if (error.originalError instanceof ApolloError) {
                            continue; // ignore user-facing errors
                        }

                        Sentry.captureException(error, {
                            tags: {
                                graphqlOperation: ctx.operation?.operation,
                                graphqlOperationName: ctx.operationName,
                            },
                            contexts: {
                                graphql: {
                                    query: ctx.request.query,
                                    variables: JSON.stringify(
                                        ctx.request.variables,
                                        null,
                                        2
                                    ),
                                    errorPath: error.path,
                                },
                            },
                        });
                    }
                });
            },
        };
    },
};
Enter fullscreen mode Exit fullscreen mode

Finally, to gracefully handle scenarios when Heroku restarts our application (i.e. when deploying a new version), we drain pending Sentry events before closing the Express server.

Draining events for a graceful shutdown
import * as Sentry from "@sentry/node";

const server = app.listen(PORT);

process.on("SIGTERM", async function shutdown(signal: string) {
    console.log(`Shutting down via ${signal}`);

    try {
        await Sentry.close(2000);
    } catch (e) {
        console.error(e);
    }

    server.close(() => {
        console.log("HTTP server closed");
    });
});
Enter fullscreen mode Exit fullscreen mode

GraphQL Client

Our React application configuration follows Sentry's React Guide with their sampled browser tracing integration configured with React Router instrumentation. Additionally, we bind a git commit hash to the release version, analogous to our Express application.

Apollo Client v3 telemetry is partially instrumented by Apollo Link Sentry, an Apollo Link middleware that records GraphQL operations as useful breadcrumbs amongst other features. We intentionally disable their transaction and fingerprint setting as we found the global scope confusing in non-GraphQL operation contexts.

Apollo Link Sentry configuration
import { SentryLink } from "apollo-link-sentry";

const sentryLink = new SentryLink({
    setTransaction: false,
    setFingerprint: false,
    attachBreadcrumbs: {
        includeError: true,
    },
});
Enter fullscreen mode Exit fullscreen mode

Complementing this library, an onError link actually reports GraphQL and network errors to Sentry with an explicit transaction name and context. The error handler arguments are not actually JavaScript Error objects; therefore, Sentry.captureMessage is invoked to improve readability within Sentry Issues. GraphQL errors are captured with a more granular fingerprint, splitting Sentry events into groups by GraphQL operation name.

onError link implementation
import { onError } from "@apollo/client/link/error";
import * as Sentry from "@sentry/react";

const errorLink = onError(({ operation, graphQLErrors, networkError }) => {
    Sentry.withScope((scope) => {
        scope.setTransactionName(operation.operationName);
        scope.setContext("apolloGraphQLOperation", {
            operationName: operation.operationName,
            variables: operation.variables,
            extensions: operation.extensions,
        });

        graphQLErrors?.forEach((error) => {
            Sentry.captureMessage(error.message, {
                level: Sentry.Severity.Error,
                fingerprint: ["{{ default }}", "{{ transaction }}"],
                contexts: {
                    apolloGraphQLError: {
                        error,
                        message: error.message,
                        extensions: error.extensions,
                    },
                },
            });
        });

        if (networkError) {
            Sentry.captureMessage(networkError.message, {
                level: Sentry.Severity.Error,
                contexts: {
                    apolloNetworkError: {
                        error: networkError,
                        extensions: (networkError as any).extensions,
                    },
                },
            });
        }
    });
});
Enter fullscreen mode Exit fullscreen mode

Capturing transactions and errors associated with GraphQL operations has enabled us to better understand the behavior of our applications. However, this value is only unlocked by surfacing the actionable subset of telemetry data in a way that is most effective for the team and process. As features change and software abstractions evolve, instrumentation must be tuned with it. Continuous attention to observability will empower the team to proactively identify issues, creating a robust feedback loop that informs future development.

Are you passionate about observable product development? We're hiring across engineering, product, and design!

36