22
AWS open source news/updates #74
Newsletter #74.
This week we have new open source projects that help you access your AWS EFS data without the need for a VPN, a new Red Team security tool that is in the very early stages and a number of new AWS projects to help you easily deploy open source projects such as Nextcloud or manage your own SQL Server deployments. Community and AWS posts covering Apache Airflow, Rust, Prometheus, DevOps, Apache Spark, R Studio, Grafana Loki and many more. Make sure you check out the videos as we have two great topics featured and finally keep up to date with the quick updates and the events section.
No newsletter next week as I take a week off to recharge, so expect a bumper edition on #75.
The articles posted in this series are only possible thanks to contributors and project maintainers and so I would like to shout out and thank those folks who really do power open source and enable us all to build on top of what they have created.
So thank you to the following open source heroes: Nicholas Omer Chiasson, Thomas Edwards, Gavin Adams, Channy Yun, Gopi Krishnamurthy, Zachariah Elliott, Pranusha Manchala, Marco Ballerini, Kseniya Stadnik, Darryl Osborne, Nima Kaviani, Matt Asay, Rob Hilton, Sarah Watson, Paul Hargis, Jason MacKay, Raphey Holmes, Mark Roy, Chayan Panda, Michael Hsieh, Mukosi Mukwevho, Siva Ramani, Naveen Balaraman, Paul Kukiel, Antje Barth, Chris Fregley, François Bouteruche, Ankur Dahiya, Lezgin Bakircioglu, Damodar Shenvi Wagle, Sumit Mishra, Srinivas Manepalli and Tom Moore.
Make sure you find and follow these builders and keep up to date with their open source projects and contributions.
efsu
scour
ssm-automation-deploy-sql-developer
ecs-windows-ci-cd-blue-green
aws-serverless-nextcloud
Zendesk
Rust
Prometheus
This popped up this week, Basic ECS Configuration for AMP (other guides exist) which provides a quick start in enabling Prometheus collection for your ECS clusters (either on EC2 or Fargate). This short guide will get you up and running in minutes.
AWS CDK
Last week I introduced a new narrative around a fictional company, I Love My Local Farmer, which is an online marketplace that lets people buy and sell locally grown fruit and vegetables. In the latest instalment, Writing your CDK in Java François Bouteruche put himself in the shoes of this company to understand the thinking why infrastructure as code (IaC) and how this company might approach this. It is good to see Java being used, we need more Java AWS CDK example applications.
Apache Airflow
In the post, Fastest way to deploy Airflow to AWS, Ankur Dahiya, Co-Founder and CEO of RunX, introduces you to an open source project called Opta that will help you quickly deploy Apache Airflow on Kubernetes. What is interesting about this project is that it focuses on leveraging external services for running the Metastore database for example, rather than managing/running that within the pod. I have not had a chance to try this yet, but it is on my todo list. Nice.
Data Wrangler
AWS Data Wrangler is an open-source Python library that makes it easy to work with your data in AWS on Python. The project page has a number of tutorials, and last week a new one, S3 Select was added. AWS Data Wrangler supports Amazon S3 Select, enabling applications to use SQL statements in order to query and filter the contents of a single S3 object. It works on objects stored in CSV, JSON or Apache Parquet, including compressed and large files of several TBs.
Grafana Loki
Grafana Loki was introduced in 2018 as a lightweight and cost-effective log aggregation system inspired by Prometheus. In the post, Managing Grafana and Loki in a regulated multitenant environment, Marco Ballerini and Kseniya Stadnik show you how you can deploy both Grafana Loki and Grafana so that you can multiple development teams that can consume the same monitoring stack, maintaining logical storage separation, and regulating which set of data each user of the platform can query from the Grafana interface.
Continuous Delivery Foundation (CDF)
The Continuous Delivery Foundation (CDF) serves as the vendor-neutral home of many of the fastest-growing projects for continuous integration/continuous delivery (CI/CD). Nima Kaviani, Matt Asay, Rob Hilton and Sarah Watson, were happy to announce in the post, AWS is doubling down on improving the open source continuous delivery experience for our customers that AWS is joining the CDF as a Premier member. Find out more about what this is and what it means, including how this means we will be taking our Spinnaker contributions farther and are teaming up with Netflix to help build the next generation of Spinnaker.
Terraform
A similar post, but this time taking a look at how you can build a DevSecOps software factory implementation, Srinivas Manepalli focus' on application vulnerability scanning using a number of open source tools such as git-secrets, Sysdiag Falco, Snyk and more. Great post, so make sure you read Building an end-to-end Kubernetes-based DevSecOps software factory on AWS
.NET Core and cdk8s
In the post, Build and Deploy .Net Core WebAPI Container to Amazon EKS using CDK & cdk8s, Siva Ramani and Naveen Balaraman provide a walkthrough of how you can use cdk8s, an open-source software development framework for defining Kubernetes applications, to deploy the sample .NET Core application on Amazon EKS. All source code is provided, and you can use this to experiment with your own .NET workloads.
AWS CDK
Paul Kukiel wrote last week, Deploy an SPA with personalized subdomains using AWS CDK showing you how you can automate the deployment of a simple single page application using AWS CDK. Full source code is provided (TypeScript) so you can take a look and use this as the base for your own projects.
Apache Spark
A couple of Apache Spark posts this week.
Starting off with Customize and Package Dependencies With Your Apache Spark Applications on Amazon EMR on Amazon EKS, Channy Yun shared last week that you can now use customisable image support for Amazon EMR on EKS that allows you to modify the Docker runtime image that runs your analytics application using Apache Spark on your EKS cluster. What this means is you can create a container that contains both your application and its dependencies, based on the performance-optimised EMR Spark runtime, using your own continuous integration (CI) pipeline.
Following that we had a joint team of GoDaddy and AWS architects (Paul Hargis, Jason MacKay, Raphey Holmes, and Mark Roy) write, Build accurate ML training datasets using point-in-time queries with Amazon SageMaker Feature Store and Apache Spark where they explain how you can use Amazon SageMaker Feature Store and the processing power of Apache Spark to create accurate training datasets using point-in-time queries against reusable feature groups in a scalable fashion.
RStudio and Shiny
It has been a while since I have worked with R, but when I did, it was using the open source RStudio tool, as well as Shiny, which is an open source project to simplify how you build interactive web applications in R. I was very happy to see this blog post, Field Notes: Accelerating Data Science with RStudio and Shiny Server on AWS Fargate written by Chayan Panda, Michael Hsieh and Mukosi Mukwevho, where they describe and show you how you can set up the infrastructure to run a secure, scalable and highly available RStudio and Shiny Server installation on AWS. A must read for anyone interested in R.
Lustre
In the post, Spend less while increasing performance with Amazon FSx for Lustre data compression Darryl Osborne dives deep and walks you through the new Amazon FSx for Lustre data compression features, sharing some simple benchmarking tests that you can use as a starting point for you to review and test your own workloads. As Darryl writes, "Enabling data compression will help you spend less while increasing performance with your FSx for Lustre file systems."
AWS SAM
AWS SAM is an open source framework for building serverless applications. During the deployment process, this transforms and expands the AWS SAM syntax into AWS CloudFormation syntax, enabling you to build serverless applications faster. In this post, Using GitHub Actions to deploy serverless applications Gopi Krishnamurthy how you can take this to the next level by using GitHub Actions to build, and deploy the application in your AWS account.
AWS Greengrass
AWS IoT Greengrass is an open source edge runtime and cloud service that helps you build, deploy, and manage device software at the edge. In this post, Implementing Local Client Devices with AWS IoT Greengrass Gavin Adams describe use cases for client devices using a local AWS IoT Greengrass Core for connectivity, messaging, and interaction with other components via the Interprocess communication feature (IPC). What's more, he uses my favourite open source project (Node Red) as part of the walk through. Very cool indeed.
Elasticsearch
Many customers are moving workloads to AWS Graviton2, ARM based instance types. In the post, Increase Amazon Elasticsearch Service performance by upgrading to Graviton2 Zachariah Elliott and Pranusha Manchala review prerequisites and considerations to upgrade your existing Amazon ES instances to Graviton2 with minimal downtime, as well as looking at some of the things you need to think about.
A couple of videos this week.
First up we have the Data Science on AWS meet up, where colleagues Antje Barth and Chris Fregley show you AWS Orbit Workbench, an open source framework that provides a single, unified experience for your data, analytics and machine learning projects. If you have not already grabbed a copy, check out their amazing book, Data Science on AWS - a must read.
Following that we have a video on SRT, which is an open source video transport protocol and technology stack that optimises video streaming performance across unpredictable networks. The Streaming Video Alliance is a global technical association addressing critical challenges in streaming video, and last week on their channel on Vimeo, Thomas Edwards from Amazon demonstrated how to utilise the SRT contribution protocol to ingest content into AWS.
You can view the original video link on Vimeo here.
Apache Cassandra
Amazon Keyspaces (for Apache Cassandra), a fully managed Apache Cassandra–compatible database service, now helps you monitor and improve application read/write performance and throughput by using new Amazon CloudWatch metrics. Keyspaces integrates with CloudWatch to give you deep observability into your Cassandra workload performance. Now, Keyspaces publishes new CloudWatch metrics to help you optimize your application data models for better read/write performance by detecting unbalanced workload traffic across your partitions. In addition, the new metrics help you detect when you need to increase the number of client connections to support greater read/write throughput.
MariaDB
The MariaDB audit plug-in is now available for Amazon Relational Database Service (Amazon RDS) for MySQL instances using MySQL major version 8.0. The MariaDB audit plug-in is also available for instances using MySQL major versions 5.6 and 5.7, and provides event logging for database activity to help customers meet compliance and audit requirements, and troubleshoot application issues. Some of the key details for implementing the plugin are:
Enabling and disabling the audit plug-in – Users can enable audit plug-in by creating an option group, adding MARIADB_AUDIT_PLUGIN option to the group, and attaching the option group to an RDS instance. Audit logging can be disabled by removing the option group from the instance.
SERVER_AUDIT_EVENTS variables – These variables allow users to specify the events they want to include in the logs (CONNECTION: users connecting and disconnecting, QUERY: queries and their result, and TABLE: which tables are affected by the queries).
SERVER_AUDIT_EXCL_USERS and SERVER_AUDIT_INCL_USERS variables – These variables specify which users' activity should be excluded from or included in the audit. SERVER_AUDIT_INCL_USERS has higher priority and all users' activity is recorded by default.
OpenSearch community meeting
29th June, 9:00am PDT
If you want to know more about what is going on in the OpenSearch project, then join the regular monthly community meetings. Read more about the agenda and how to join by reading here.
Cloud Native Day
23rd September, Bern Switzerland
What is this, an in person event returning? A stellar line up including our own Michael Hausenblas, an event looking at CNCF projects and the future of IT. Find out more and to view prices/register, by clicking here.
I hope this summary has been useful. Remember to check out the Open Source homepage to keep up to date with all our activity in open source by following us on [@AWSOpen](https://twitter.com/AWSOpen
22