33
Namespacing for GraphQL: Conflict-Free merging of any number of APIs
Namespacing is an essential concept in programming, allowing us to group things and prevent naming collisions. This post shows you how we apply the concept to APIs to make composition and integration of different services easier.
We'll show you how to integrate 8 services, SpaceX GraphQL, 4x GraphQL using Apollo Federation, a REST API using OpenAPI Specification, a PostgreSQL-based API and a Planetscale-Vitess-based (MySQL) API with just a couple of lines of code, fully automatic, without any conflicts.
When you install a npm package, it lives within its own namespace. One such package is axios, a very popular client to make HTTP requests.
To install axios, you run the following command:
yarn add axios
This installs the axios dependency into your node_modules folder and adds it to your package.json file.
From now on, you can import and use the code provided by the axios package like so:
import axios from "axios";
const res = await axios.get("https://example.com");
Import the dependency, give it a name, in this case just axios, then use it. We could have also renamed axios to bxios. Renaming an import is essential to dependency management to avoid collisions.
One essential rule is that you should have no two imports with the same name, otherwise you have a naming collision, and it's unclear how the program should be executed.
Should we run axios or bxios?
Alright, enough intro. You're probably familiar with all this already, what does it have to do with APIs?
A lot! At least I think so. This whole workflow is amazing!
You can write code, package it up as a npm package, publish it, and others can import and use it very easily. It's such a nice way to collaborate using code.
How does it look like for using APIs? Well, it's not such an oiled machine. With APIs, we're still in the stone-age when it comes to this workflow.
Some companies offer an SDK which you can download and integrate. Others just publish a REST or GraphQL API. Some of them have an OpenAPI Specification, others just offer their own custom API documentation.
Imagine you'd have to integrate 8 services to get data from them. Why could you not just run something similar to yarn add axios and get the job done? Why is it so complicated to combine services?
To get there, we have to solve a number of problems.
- We need to settle on a common language, a universal language to unify all our APIs
- We need to figure out a way to "namespace" our APIs to resolve conflicts
- We need a runtime to execute the "namespaced" Operations
Let's drill down the problems one by one.'
The first problem to solve is that we need a common language to base our implementation approach on. Without going onto a tangent, let me explain why GraphQL is a great fit for this purpose.
GraphQL comes with two very powerful features that are essential for our use case. On the one hand, it allows us to query exactly the data we need. This is very important when we're using a lot of data sources as we can easily drill down into the fields we are interested in.
On the other hand, GraphQL lets us easily build and follow links between types. E.g. you could have two REST Endpoints, one with Posts, another with Comments. With a GraphQL API in front of them, you can build a link between the two Objects and allow you users to get Posts and Comments with a single Query.
On top of that, GraphQL has a thriving community, lots of conferences and people actively engaging, building tools around the Query language and more.
That said, GraphQL also has a weakness when it comes to API integration. It doesn't have a concept of namespaces, making it a bit complex to use it for API integration, until now!
When it comes to service integration, there are so far two major approaches to solve the problem. For one, there is Schema Stitching and then there's also Federation.
With Schema Stitching, you can combine GraphQL services that are not aware of the stitching. Merging the APIs happens in a centralized place, a GraphQL API gateway, without the services being aware of this.
Federation, specified by Apollo, on the other hand proposes a different approach. Instead of centralizing the stitching logic and rules, federation distributes it across all GraphQL Microservices, also known as Subgraphs. Each Subgraph defines how it contributes to the overall schema, fully aware that other Subgraphs exist.
There's not really a "better" solution here. Both are good approaches to Microservices. They are just different. One favours centralized logic while the other proposes a decentralized approach. Both come with their own challenges.
That being said, the problem of service integration goes way beyond federation and schema stitching.
The number one pattern of Principled GraphQL is about integrity and states:
**
Your company should have one unified graph, instead of multiple graphs created by each team. By having one graph, you maximize the value of GraphQL:**
- More data and services can be accessed from a single query
- Code, queries, skills, and experience are portable across teams
- One central catalog of all available data that all graph users can look to
- Implementation cost is minimized, because graph implementation work isn't duplicated
- Central management of the graph – for example, unified access control policies – becomes possible
When teams create their own individual graphs without coordinating their work, it is all but inevitable that their graphs will begin to overlap, adding the same data to the graph in incompatible ways. At best, this is costly to rework; at worst, it creates chaos. This principle should be followed as early in a company's graph adoption journey as possible.
Let's compare this principle to what we've learned about code above, you know, the example with axios and bxios.
Imagine there was one giant npm package per company with all the dependencies. If you wanted to add axios to your npm package, you'd have to manually copy all the code into your own library and make it "your own" package. This wouldn't be maintainable.
One single graph sounds great when you are in total isolation. In reality however, it means that you have to add all external APIs, all the "packages" that you don't control, to your one graph. This integration must be maintained by yourself.
It's right. With just one graph, we can easily share Queries across teams. But is that really a feature? If we split our code into packages and publish them separately, it's easy for others to pick exactly what they need.
Imagine a single graph with millions of fields. Is that really a scalable solution? How about just selecting the sub-parts of a giant GraphQL schema that are really relevant to you?
With just one schema, we can have a centralized catalog, true. But keep in mind that this catalog can only represent our own API. What about all the other APIs in the world?
Also, why can't we have a catalog of multiple APIs? Just like npm packages which you can search and browse.
I'd argue that the opposite is true. Especially with Federation, the proposed solution by Apollo to implement a Graph, it becomes a lot more complex to maintain your Graph. If you want to deprecate type definitions across multiple Subgraphs, you have to carefully orchestrate the change across all of them.
Microservices are not really micro if there are dependencies between them. This pattern is rather called distributed monolith.
It's interesting what should be possible but isn't reality. We're yet to see a centralized access control policy system that add role based access controls for federated graphs. Oh, this is actually one of our features, but let's not talk about security today.
Building one single Graph sounds like a great idea when your isolated on a tiny isle with no internet. You're probably not going to consume and integrate any third party APIs.
Anybody else who is connected to the internet will probably want to integrate external APIs. Want to check sales using the stripe API? Send emails via Mailchimp or Sendgrid? Do you really want to add these external services manually to your "One Graph"?
The One Graph principle fails the reality check. Instead, we need a simple way to compose multiple Graphs!
The world is a diverse place. There are many great companies offering really nice products via APIs. Let's make it easy to build integrations without having to manually add them to our "One Graph".
That leads us to our second problem, naming conflicts.
Imagine that both stripe and mailchimp define the type Customer, but both of them have a different understanding of the Customer, with different fields and types.
How could both Customers types co-exist within the same GraphQL Schema? As proposed above, we steal a concept from programming languages, namespaces!
How to accomplish this? Let's break down this problem a bit more. As GraphQL has no out-of-the-box namespacing feature, we have to be a bit creative.
First, we have to remove any naming collisions for the types. This can be done by suffixing each "Customer" type with the namespace. So, we'd have "Customer_stripe" and "Customer_mailchimp". First problem solved!
Another issue we could run into is field naming collisions on the root operation types, that is, on the Query, Mutation and Subscription type. We can solve this problem by prefixing all fields, e.g. "stripe_customer(by: ID!)" and "mailchimp_customer(by: ID!)".
Finally, we have to be careful about another feature of GraphQL, often ignored by other approaches to this problem, Directives!
What happens if you define a directive called @formatDateString and two Schemas, but they have a different meaning? Wouldn't that lead to unpredictable execution paths? Yes, probably. Let's also fix that.
We can rename the directive to @stripe_formatDateString and @mailchimp_formatDateString respectively. This way, we can easily distinguish between the two.
With that, all naming collisions should be solved. Are we done yet? Actually not. Unfortunately, with our solution we've created a lot of new problems!
By renaming all types and fields, we've actually caused a lot of trouble. Let's have a look at this Query:
{
mailchimp_customer(by: ID!) {
id
name
registered @mailchimp_formatDateString(format: "ddmmYYYY")
... on PaidCustomer_mailchimp {
pricePlan
}
}
}
What are the problems here?
The field "mailchimp_customer" doesn't exist on the Mailchimp schema, we have to rename it to "customer".
The directive "mailchimp_formatDateString" also doesn't exist on the Mailchimp Schema. We have to rename it to "formatDateString" before sending it to the upstream. But be careful about this! Make sure this directive actually exists on the origin. We're automatically checking if this is the case as you might accidentally use the wrong directive on the wrong field.
Lastly, the type definition "PaidCustomer_mailchimp" also doesn't exist on the origin schema. We have to rename it to "PaidCustomer", otherwise the origin wouldn't understand it.
Sounds like a lot of work? Well, it's already done and you can use this right away. Just type yarn global add @wundergraph /wunderctl into your terminal, and you're ready to try it out!
It's also going to be open source very soon. Make sure to sign up and get notified when we're ready!
33