Analytics using Kinesis Data Firehose 📊

If you haven't started implementing analytics in your application yet, you're behind the game. The most valuable asset in modern era is; Data, and this will grow in the future as well

So what is Analytics? Through Analytics you can get feedback from your clients without asking them, you can collect data, see how your users are navigating, performing, visiting etc... After collecting this data, you can analyze, target more useful features, and update your UI hence making it more user friendly with your analyzed data

Basically after you implement analytics your application will become more mature, convenient for users, and more successful for your business 👨🏻‍🚀

In this Blog we will create an AWS Kinesis Data Firehose that delivers stream to S3 bucket (using AWS CDK with typescript), and put our records from our React app using Amplify

Amazon Kinesis Data Firehose is the easiest way to reliably load streaming data into data lakes, data stores, and analytics services

Let us split the content into two parts, first the Infrastructure then the frontend

1- Infrastructure

We are going to use CDK to provision our resources, it helps us to write our code using languages like typescript. I will focus on how to create the S3 bucket & Firehose, and how to give permission to our Unauthenticated users to use it, whereas I highly suggest to go deep in how to create Cognito as well, but since our scope isn't about authentication I wont be focusing on that. You can check the official CDK documentation via this link

Resource Diagram:
Image description

CDK stack:

const bucket = new s3.Bucket(this, bucketName, {
  bucketName,
});

const result = bucket.addToResourcePolicy(new iam.PolicyStatement({
  actions: ['s3:GetObject'],
  resources: [bucket.arnForObjects('file.txt')],
  principals: [new iam.AccountRootPrincipal()],
}));

const s3Destination = new destinations.S3Bucket(bucket, {
  dataOutputPrefix,
  errorOutputPrefix,
  bufferingInterval: Duration.minutes(bufferingInterval),
  bufferingSize: Size.mebibytes(bufferingSize),
});

const stream = new firehose.DeliveryStream(this, 'Delivery Stream', {
  deliveryStreamName: streamName,
  destinations: [s3Destination],
});
bucketName: 'your-bucket-name',
streamName: 'your-stream-name',
dataOutputPrefix: 'myFirehose/DeliveredYear=!{timestamp:yyyy}/anyMonth/rand=!{firehose:random-string}',
errorOutputPrefix: 'myFirehoseFailures/!{firehose:error-output-type}/!{timestamp:yyyy}/anyMonth/!{timestamp:dd}',
bufferingInterval: 2,
bufferingSize: 8,
  • First we create our bucket, then we add the resource policy

Our destination has the following parameters

  • dataOutputPrefix for files successfully delivered to S3
  • errorOutputPrefix for failed records before writing them to S3
  • By default, the buffer size (bufferingSize) is 5 MiB and the buffer interval is 5 minutes (bufferingInterval). But in our example I changed them to 2 minutes and 8 MiB

Incoming data is buffered before it is delivered to the specified destination. The delivery stream will wait until the amount of incoming data has exceeded some threshold (the "buffer size") or until the time since the last data delivery occurred exceeds some threshold (the "buffer interval"), whichever happens first.

Now regarding our Cognito, and how to be able to put records from frontend, we will add this policy to our Authenticated & Unauthenticated users, this is flexible you can choose which users to be able to send records, in our case I will add for both

{
  effect: iam.Effect.ALLOW,
  actions: [
    'firehose:PutRecord',
    'firehose:PutRecordBatch'
  ],
  resources: ['your-firehose-arn']
}

2- React App

Amplify is a set of purpose-built tools and services that makes it quick and easy to use our AWS resources, I will be using Analytics part here, for official documentation you can visit this link

  • First let us see how we can configure our Analytics

App.js

import { Analytics, AWSKinesisFirehoseProvider } from 'aws-amplify';

Amplify.configure(awsConfig);
Analytics.addPluggable(new AWSKinesisFirehoseProvider);
  • awsConfig is a json file that holds our configuration, inside it add this object
Analytics: {
  AWSKinesisFirehose: {
    region: REGION
  }
}
  • Now lets add our fancy Button
const onClick = async () => {
  const now = new Date;
  const data = {
    id: now.getTime(),
    action: 'Add Button',
    component: 'Button',
    user: 'the username of user',
    source: 'Web',
  };

  try {
    await Analytics.record({
      data: data,
      streamName: FIREHOSE,
    }, 'AWSKinesisFirehose');
  } catch (error) {
    console.log(error);
  }
}

<button
  onClick={onClick}
>Add Button</button>
  • FIREHOSE is our Kinesis firehose name that we created previously from the CDK
  • Analytics.record is where we put our record to the Firehose, after 2 minutes we can view it from our S3 bucket, our data is simple I added some attributes just to show how it can scale, and you can send very useful data for Analytics or even do ML on it later

Finally our result inside our S3 bucket (the file should be downloaded)

{"id":1637695583710,"action":"Add Button","component":"Button","user":"the username of user","source":"Web"}

In this example, I tried to limit the content scope to our topic, I used a simple button component, but this can go beyond your expectations & creativity, you can collect very useful and beneficial data from your users, which would assist in enhancing & improving of your applications 🙂

45