36
GraphQL Observability with Sentry
At Fieldguide, Hasura exposes a GraphQL API on Postgres, extended with custom types implemented in a Node.js application's Apollo Server. Our front-end React application interacts with Hasura via Apollo Client, and our applications are managed on Heroku. GraphQL's inherent self-documentation has fueled an ecosystem of developer tooling, and its use with TypeScript results in highly efficient internal API development.
While iteration speed is certainly a key product development metric, understanding the behavior of features is equally important. This complementary information confirms development assumptions and surfaces inevitable bugs, providing a feedback loop that informs future iteration. Application behavior can be observed by generating proper telemetry data such as metrics, logs, and traces.
We adopted Sentry, an error tracking and performance monitoring platform, in the beginning weeks of our product's inception. We have iterated on the integration over the past year, improving our ability to diagnose performance (traces) and triage errors (a subset of logs). This Sentry integration overview is derived from our specific Node.js GraphQL server and React GraphQL client, but the takeaways can be applied to any system with GraphQL interactions.
Sentry provides informative guides for many platforms. In our server's case, we apply Apollo Server v2 as an Express middleware; therefore, Sentry's Express Guide with request, tracing, and error handlers is a great starting point.
As part of initialization, we configure tracesSampleRate
such that a sampling of traces count towards our quota. Additionally, we bind a git commit hash (exposed via Heroku's Dyno Metadata feature) to the release version, enabling Sentry to monitor release health.
Sentry's Express-compatible tracing handler starts a transaction for every incoming request with a name derived from the HTTP method and path. This works well for REST APIs, but GraphQL entities are not identified by URLs, and by default all GraphQL requests will be identified by POST /graphql
. To achieve proper specificity, we instantiate Apollo Server with a custom plugin that qualifies transaction names with the contextual GraphQL operation when Apollo receives a request.
Apollo Server plugin responding to the requestDidStart
event
import * as Sentry from "@sentry/node";
import { ApolloServerPlugin } from "apollo-server-plugin-base";
export const sentryPlugin: ApolloServerPlugin = {
requestDidStart({ request }) {
if (request.operationName) {
const scope = Sentry.getCurrentHub().getScope();
const transaction = scope?.getTransaction(); // retrieve ongoing transaction
if (transaction) {
// qualify transaction name
// i.e. "POST /graphql" -> "POST /graphql: MyOperation"
scope?.setTransactionName(
`${transaction.name}: ${request.operationName}`
);
}
}
},
};
Similarly, GraphQL errors differ from conventional REST APIs. Exceptions thrown while executing a GraphQL operation are represented as an errors
response body field and will not inherently be captured by Sentry's Express-compatible error handler. We report these errors with an identified user and context by extending our Apollo Server plugin as described in this Sentry blog.
Extended Apollo Server plugin responding to the didEncounterErrors
event
import * as Sentry from "@sentry/node";
import { ApolloError } from "apollo-server-express";
import { ApolloServerPlugin } from "apollo-server-plugin-base";
export const sentryPlugin: ApolloServerPlugin = {
requestDidStart({ request }) {
if (request.operationName) {
// qualify transaction name
// ...
}
return {
didEncounterErrors(ctx) {
if (!ctx.operation) {
return; // ignore unparsed operations
}
Sentry.withScope((scope) => {
if (ctx.context.currentUser) {
scope.setUser({
id: String(ctx.context.currentUser.id),
// ...
});
}
for (const error of ctx.errors) {
if (error.originalError instanceof ApolloError) {
continue; // ignore user-facing errors
}
Sentry.captureException(error, {
tags: {
graphqlOperation: ctx.operation?.operation,
graphqlOperationName: ctx.operationName,
},
contexts: {
graphql: {
query: ctx.request.query,
variables: JSON.stringify(
ctx.request.variables,
null,
2
),
errorPath: error.path,
},
},
});
}
});
},
};
},
};
Finally, to gracefully handle scenarios when Heroku restarts our application (i.e. when deploying a new version), we drain pending Sentry events before closing the Express server.
Draining events for a graceful shutdown
import * as Sentry from "@sentry/node";
const server = app.listen(PORT);
process.on("SIGTERM", async function shutdown(signal: string) {
console.log(`Shutting down via ${signal}`);
try {
await Sentry.close(2000);
} catch (e) {
console.error(e);
}
server.close(() => {
console.log("HTTP server closed");
});
});
Our React application configuration follows Sentry's React Guide with their sampled browser tracing integration configured with React Router instrumentation. Additionally, we bind a git commit hash to the release version, analogous to our Express application.
Apollo Client v3 telemetry is partially instrumented by Apollo Link Sentry, an Apollo Link middleware that records GraphQL operations as useful breadcrumbs amongst other features. We intentionally disable their transaction and fingerprint setting as we found the global scope confusing in non-GraphQL operation contexts.
Apollo Link Sentry configuration
import { SentryLink } from "apollo-link-sentry";
const sentryLink = new SentryLink({
setTransaction: false,
setFingerprint: false,
attachBreadcrumbs: {
includeError: true,
},
});
Complementing this library, an onError
link actually reports GraphQL and network errors to Sentry with an explicit transaction name and context. The error handler arguments are not actually JavaScript Error
objects; therefore, Sentry.captureMessage
is invoked to improve readability within Sentry Issues. GraphQL errors are captured with a more granular fingerprint, splitting Sentry events into groups by GraphQL operation name.
onError
link implementation
import { onError } from "@apollo/client/link/error";
import * as Sentry from "@sentry/react";
const errorLink = onError(({ operation, graphQLErrors, networkError }) => {
Sentry.withScope((scope) => {
scope.setTransactionName(operation.operationName);
scope.setContext("apolloGraphQLOperation", {
operationName: operation.operationName,
variables: operation.variables,
extensions: operation.extensions,
});
graphQLErrors?.forEach((error) => {
Sentry.captureMessage(error.message, {
level: Sentry.Severity.Error,
fingerprint: ["{{ default }}", "{{ transaction }}"],
contexts: {
apolloGraphQLError: {
error,
message: error.message,
extensions: error.extensions,
},
},
});
});
if (networkError) {
Sentry.captureMessage(networkError.message, {
level: Sentry.Severity.Error,
contexts: {
apolloNetworkError: {
error: networkError,
extensions: (networkError as any).extensions,
},
},
});
}
});
});
Capturing transactions and errors associated with GraphQL operations has enabled us to better understand the behavior of our applications. However, this value is only unlocked by surfacing the actionable subset of telemetry data in a way that is most effective for the team and process. As features change and software abstractions evolve, instrumentation must be tuned with it. Continuous attention to observability will empower the team to proactively identify issues, creating a robust feedback loop that informs future development.
Are you passionate about observable product development? We're hiring across engineering, product, and design!
36