Working with Stripe as a Source of Truth

If you're using Stripe, there's a good chance your app's functionality depends on data stored in Stripe. Some data requests are easy. A user's billing details page is often just one API call to Stripe's v1/payment_methods?customer={customerId} endpoint (and a little formatting) away.

But some data requests are hard. Some require a combination of data stored in Stripe along with data in your own app to know what to show the user.

For example, showing a re-order button on a site's order history page should have full details of both past purchases (from Stripe) and current stock levels from an inventory management system.

What are the best approaches for implementing these kind of complex data integrations with Stripe?

Practical example: WebFlicks Movie Search

A practical example will help us evaluate the pros and cons of different approaches to integrating Stripe data back into your app.

Let's say you've built a movie streaming service, WebFlicks, where your users pay for subscriptions to access different packages of shows and movies. $3/month for animated movies, $2/month for action movies etc.

Now, you want your Movie Search feature to intelligently mark movies as available for each user. So as a user searches for content, the results consider a user's purchase and subscription history by replacing a Buy Now button with a Watch Now button.

The requirements might look something like:

  1. A movie should be available if the user has an active subscription including that movie
  2. A movie should be available if the user has purchased it
  3. A movie's available flag should be immediately updated when:
    1. a user starts a new subscription
    2. a user's subscription is cancelled and expired
    3. a movie is added to and removed from a genre

Now, let's say your search is based on a table called movie_search:

movie_id subscription_groups title keywords
1 action, adventure Indiana Jones 'harrison ford action archaeologist'
2 action, sci-fi Star Wars 'harrison ford space opera'

You'll use Postgres' tsvector and tsquery for a simple Full Text Search interface. For example, the following sql clause will match any rows where the keywords field contains both harrison and ford:

to_tsvector(movie.keywords) @@ to_tsquery('harrison & ford')

A solution for running a user search, including a boolean flag (included_in_user_subscription) might look something like this:

def get_movie_results(customer_id, query):
    # somehow fetch the users's active subscriptions
    user_subscribed_groups = get_subscription_groups_for_customer(customer_id)
    query = """
      select
        movie_search.movie_id,
        movie_search.title,
        -- We'll use postgres' '&&' (overlap) operator to check if the user's subscriptions
        -- include any of the subscription_groups for the movie.
        -- The users's 'subscription_groups' are parameterized here as '$1'
        (movie.subscription_groups && $1) as included_in_user_subscription
      from movie_search
      where to_tsvector(movie.keywords) @@ to_tsquery($2)
    """
    return db.execute(query, [subscription_groups, query])

To serve these search results, we need to know the details of any subscriptions currently active for the user so that we can include those details as part of the search query. To do this, we need access to the full subscription state and subscription items to serve any request on WebFlick's search page. What is the best way to do this?

Common Approaches

There are several ways to handle this kind of problem, each with their own downsides and edge cases.

Method 1: Hammering The Stripe API

The most obvious approach is to simply query stripe via their API whenever you need to access the data. Maybe with a shorted lived cache.

One strategy for this may be to something like the following

  1. List the customer's subscriptions and subscription items via Stripe's /v1/subscriptions?customer=<customerId> endpoint. The response includes subscriptionItems as a list (which may require separate pagination calls) which each contain a reference to a product.
  2. Using the productIds from the subscriptionItems you could now use /v1/products?id=<productId1>,<productId2>,<productId3> (again, requiring pagination if you have a lot of productIds) to get that metadata[1], which you can then merge into the results from your movie search system to mark movies which are included in active subscriptions as available.

[1] Depending on your Stripe integration, a Stripe ProductId could map to a number of different objects in your system such as a specific Movie, a Genre, a Movie Studio or something like New Releases. There are multiple ways you may be storing the relationship between Stripe's concept of a product and your own, but a common pattern is to store this on the product.metadata field.

An implementation of this method might look something like the following. We'll use Stripe's Python API as an example, including its auto_paging_iter helper method that handles pagination. Note: this method may use as few as 2 api calls, but any pagination required will still count towards Stripe's API rate limits of 100 requests per second.

def get_customer_subscription_groups(customer_id: str):
    subscribed_product_ids = set()
    subscriptions = stripe.Subscriptions.list(customer=customer_id)
    for subscription in subscriptions.auto_paging_iter():
        for subscription_item in subscription.items.auto_paging_iter():
            product_id = subscription_item.price.product
            subscribed_product_ids.add(product_id)

    subscription_groups = set()
    products = stripe.Products.list(ids=list(subscribed_product_ids))
    # for each product, extract the 'subscription_group' and add it to our set
    for product in products.auto_paging_iter():
        metadata = product.metadata
        subscription_group = metadata.get('subscription_group')
        subscription_groups.add(subscription_group)
    return list(subscription_groups)

Pros:

  • Stripe is the ultimate source of truth for subscription/order data
  • It's simple, and the Stripe API has great docs
  • It's usually fast to onboard a new feature

Cons:

  • The Stripe API doesn't have a lot of filter options, so you'll probably need to paginate
  • The Stripe API rate limit of 100/s would allow maybe on 20/25 search page requests per second due to multiple calls
    • If you exceed these limits you could start interfering with orders and payments!
  • Caching the results from the Stripe API for more than a few minutes could lead to a bad user experience if someone sees a movie which they know they should have access to but don't.

Method 2: Maintain Your Own Copy Of The Data

Having order and subscription data in your own database gives you a lot more options! A (relatively) simple SQL query can merge your realtime subscription data directly with your inventory data efficiently and flexibly. This query makes use of stripe.subscription, stripe.subscription_items and stripe.product tables which contain the same data as you'd get from the Stripe API. The approach is similar to Including Subscriptions In The Search Query, outlined above, but with the added benefit of the user's subscription database being pulled directly from the same database as part of the query.

select
    movie_search.movie_id,
    exists(
        select 1
        from stripe.subscription s
        join stripe.subscription_item si
            on s.id = si.subscription_id
        join stripe.product p
            on si.product_id = p.id
        where s.customer_id = '<Stripe_customer_id>'
        and p.metadata -> 'group' = ANY(movie.subscription_groups)
    ) as included_in_subscription
from movie_search
where to_tsvector(move.info) @@ to_tsquery('harrison & ford')

This allows a fast and scalable interface for search without having to touch the Stripe API. If you can keep these tables up to date in realtime then you have a convenient and scalable (e.g. via database replication) solution without any caching and hard rate limits.

So, what's the best way to build and maintain this database?

Option 1. Update State With Stripe's Webhooks

Stripe's webhooks mechanism allows you to register an HTTP endpoint in your own API, which Stripe will use to alert you of any changes to the data in your Stripe account. For example, a new subscription for product prod_0001 would trigger an HTTP POST request to your API with contents like:

{
  "object": "event",
  "type": "customer.subscription.created",
  ...
  "data": {
    "object": {
      "object": "subscription",
      "id": "sub_0027",
      "customer": "cus_0012",
      ...
      "items": [
        ...
        "data": [
          {
            "object": "subscription_item",
            ...
            "price": {
              "product": "prod_0001"
            }
          }
        ]
      ]
    }
  }
}

We recently wrote a pretty thorough summary of the issues with webhooks so I'll just summarize here:

  • Requires high uptime to reliably receive events
  • Hard to recover from bugs in webhook handlers
  • No handling of events that occurred before the webhook was integrated. You'll need to setup a backfill operation.
  • Even with Stripes toolings, its hard to work with in a development environment (where you likely don't have a secure public facing API to receive webhooks)
  • Having to manage verification of the webhook's source

Option 2. Update State After All Stripe API Operations

All actions on the Stripe API return the new (and modified) Stripe objects in the response. You can use this response data to update your own database.

class Subscripton(Model):
    id = fields.text()
    # ... more fields

    def create_subscription(customer_id: str, price: str):
        subscription = stripe.Subscription.create(
            customer=customer_id",
            items=[
                {"price": price"},
            ],
        )
        # tranform the customer object to fit your model
        # and insert-and-update
        self.create_from_stripe_object(subscription)
        return customer

Here, stripe.Subscription.create will return an object similar to the WebHook JSON payload shown above, including nested objects for subscription, subscription_item and price (amongst others). You could use this response in a function, something like Subscription.create_from_stripe_object to traverse these nested objects and insert/update the corresponding records in your own database.

This approach allows for some nice abstractions, but thorough implementation would need to manage over 2,000 fields! Yes, you could just implement tables and fields as-and-when you need them, but this brings back a problem we had with the webhook approach - you'd need a whole separate system to backfill data whenever your requirements change!

One final drawback: even the most thoroughly planned integration between your app and Stripe is likely to hit some unexpected feature limitations. Don't have an in-app way to handle changing the shipping address for an order? Someone's probably going to have to bypass your app entirely to log into the Stripe console to fix a few manually. Even worse, you may expect cancelled subscriptions to come via your users logging in clicking unsubscribe but sometimes payments are refused and bank accounts get closed - and you'll need to handle that! You'll still need webhooks or else these changes will never make it back to your app database!

Option 3. Existing Services

Several services exist for syncing Stripe data to a query-able database, namely Stripe's own Sigma and FiveTran's Stripe Connector. These services are great and solve some difficult problems, but they're not realtime. They're aimed primarily at Data Warehousing and Analytics use cases and are updated snapshots.

A intermittent update frequency is a deal-breaker. Remember that we need to quickly reflect new purchases and subscriptions.

Option 4. Here Comes Sequin

So it turns out that maintaining a copy of the full Stripe dataset in realtime is complicated. Worse, it's messy.

You need a hybrid approach. The accuracy and realtime benefits of directly using the Stripe API in addition to the scalability and flexibility of having the data in your application database can only be achieved by historical back-filling of data from the Stripe API in combination with real-time event syncing.

This is why we built Sequin.

28