Upcoming trends in DevOps and SRE

DevOps and SRE are domains with rapid growth and frequent innovations. With this blog you can explore the latest trends in DevOps, SRE and stay ahead of the curve.

The past decade has seen widespread adoption of DevOps methodologies in software development. Unsurprisingly, as the needs of users change, DevOps techniques have evolved as well. In this blog we will look at the trends that are most likely to have a significant impact in the coming years.

The trends mentioned below are most likely to have a lasting impact in the field of DevOps and SRE:

  1. AIOps and self-healing platforms
  2. Service Meshes
  3. Lowcode DevOps
  4. GitOps
  5. DevSecOps

AIOps and self-healing platforms

With the growing complexity of infrastructure, there is an increasing volume of system metadata(logs, traces, metrics) being generated. This makes it challenging for operations teams to manually sort through metadata and diagnose issues in production systems when things go wrong. AIOps tools seek to solve this problem by using AI techniques to detect problems from operations data.

Ideally, an AIOps tool should prevent problems from occurring in the future by analysing the present state of the infrastructure. Often, the origins of a future outage can be found in the everyday metrics of a production system. While people may miss this data hidden in the logs, an AIOps tool will be able to detect the issue so that it can be preemptively fixed.

There are several open source AIOps frameworks you can start using today:

  • Jumbune provides you with analytics to improve the performance of Hadoop clusters.
  • Log3C lets you parse system logs and identify issues.
  • AIOpsTools is a python toolkit that has anomaly detection and time-series forecasting features.

Closely related to AIOps tools are self-healing platforms that understand how processes run regularly and identify deviations from usual system behaviour. Organizations are likely to adopt self-healing platforms to detect and repair operational issues before facing catastrophic breakdowns.

AIOps solutions will allow DevOps teams to shift their emphasis from routine maintenance to high-value tasks like feature development and capacity planning. A mature AIOps solution must also be capable of providing recommendations and uncovering patterns that people would have missed otherwise.

A survey conducted by Digitate in the fall of 2020 found that DevOps practitioners turned to AIOps to reduce the number of routine maintenance tasks. There was also interest in tools that proactively detect issues among the survey respondents.

Service Meshes

Service meshes are quickly becoming an essential part of the cloud native stack. A large cloud application may require hundreds of microservices and serve a million users concurrently. A service mesh is a low-latency infrastructure layer that allows high traffic communication between different components of a cloud application(database, frontend, etc). This is done via Application Programming Interfaces (APIs).

Most distributed applications today have a load balancer that directs traffic; however, most load balancers are not equipped to deal with a large number of dynamic services whose location/counts vary over time. To ensure that large volumes of data are sent to the correct endpoint, we need tools that are more intelligent than traditional load balancers. This is where Service Meshes come into the picture.

In a typical microservice application the load balancer or firewall are programmed with static rules. However, as the number of microservices increases and the architecture changes dynamically, these rules are no longer enough. Instead in a service mesh if a new service shows up, it auto-registers, thus allowing us to easily manage the complexity.

We can consider a service mesh to be similar to the central nervous system of a living organism. As parts of the organism’s body changes over time, the nervous system responds accordingly.

Some examples of open source service meshes are Istio, Linkerd and Kuma. Service meshes provide a comprehensive view of your service and aid with complex activities like A/B testing, canary roll-outs, access restrictions, and end-to-end authentication.

A service mesh like Linkerd can be implemented without requiring any extra code to be written in the target application. Service meshes offer telemetry and operations data of your services in a standardised form. This is possible because service meshes are a dedicated infrastructure layer through which all service-to-service communication occurs.

In a survey conducted by Newstack.io one third of respondents are already using service meshes in Kubernetes production environments. Another 34% of respondents are using them in test environments and evaluating their usefulness.

Low-code DevOps

Low-code and no-code platforms promise a future where developers don't need to worry about the tedious aspect of building software and can instead focus on the more creative side. Additionally, these platforms also allow people lacking programming skills to build software. Low-code platforms have caught the interest of smaller organisations looking to add additional features to their products without investing in costly development skills.

Low-code platforms are effective in reducing development time and enabling more collaborative development processes. DevOps which is itself built on a culture of collaboration is well aligned for this transition. Some consider Infrastructure as Code(especially Infra provisioning) to be a form of low-code DevOps.

There are certain tradeoffs that have to be considered while considering low-code platforms.The applications developed using low-code platforms will be less optimized and have more generic user interfaces. This means that these platforms are more suited to building internal or developer-facing apps.

Several major cloud providers(AWS, GCP and Azure) have created low-code tools for their customers. An example of a tool would be PowerApps for Azure DevOps which seeks to make low-code more accessible to developers in the Microsoft ecosystem. Likewise, Oracle has Application Express and Visual Builder Code for developers creating apps for Oracle’s Cloud database.

Mike Duensing, CTO at Skuid, a no-code cloud app platform developer says “The rise of low- and no-code is comparable to mobile devices being packed with computing power that once resided in mainframes. It is a level of abstraction that keeps going, making something tedious, hard to do, and time consuming much easier to do.”

The Software Survey Asia-Pacific 2020 conducted by Outsystems reports that more than half Asia-Pacífic decision makers are convinced their organisations can rely on low-code platforms for at least a quarter of all planned projects. In 2021, as stated in the study, low code tools are designed to reach critical mass acceptance.

GitOps

Version control systems(Git, SVN, etc.) lets us track and manage changes made to a codebase over time. This audit trail makes it easier to revert to previous states and identify system-breaking commits. GitOps is founded on a similar philosophy and attempts to bring a corresponding level of transparency to Infrastructure as Code repositories.

To implement GitOps in your organisation, a repository with all the declarative descriptions of your infrastructure needs to be created. This repository behaves like any other standard Git code repository. When any changes are required in your production environment, instead of the usual process of manually deploying the updated configuration files - you can simply edit the associated GitOps repository.

Once edits are made to the repository, an automated process deploys the modified configuration files and your production environment is updated to the desired state.

A simple way of describing GitOps is with the following formula:

GitOps = Merge Requests from Git + IaC + CI/CD

A mature GitOps operating model will consist of three pillars: Infrastructure-as-Code (IaC), Merge Requests (MRs) as the request for change and Continuous Integration/Continuous Delivery (CI/CD).

Pull requests (initiated by developers) can modify the state of the GitOps repository. Once accepted (usually by the Ops engineer) and merged, the pull requests will immediately reconfigure and update the production environment according to the recent commit to the repository.

Organisations have found GitOps to be useful to track infrastructure drift and do emergency rollbacks with minimal disruption to users. One key difference between traditional Infrastructure as Code and GitOps is the emphasis that GitOps places on fully automating the process of deploying and tracking subsequent updates to your infrastructure.

With GitOps, there is no need to switch tools for deployment. Your production environment is also more secure since developers do not directly access the infrastructure while making pull requests. In addition, you benefit from all the advantages of code reviews, pull requests and comments on your infrastructure changes.

In certain scenarios, GitOps may not be the right choice for your organisation. Some enterprise applications need to manage secrets(authentication and sensitive details) separately from version control. Since GitOps must have a Git repository accessible to all, it does not offer a convenient way to manage secrets.

DevSecOps

Ransomware attacks and other cyber threats have increased in the past decade. Attackers in many high-profile cases have used previously unknown vulnerabilities(called zero-day exploits) to breach systems.

Traditional models of security have mostly focused on testing the security of applications post deployment with security researchers looking for undiscovered vulnerabilities. The DevSecOps movement seeks to reduce the number of these attacks by integrating better security practices while software is being built and deployed.

“DevSecOps — short for development, security, and operations — automates the integration of security at every phase of the software development lifecycle, from initial design through integration, testing, deployment, and software delivery.” — IBM

IBM’s definition of DevSecOps shown above is a great starting point to understand this emerging trend. Each member of the DevSecOps team should view security as one of their core duties, rather than something that is overseen by others. Just as DevOps broke the silos between developers and operations, DevSecOps seeks to do the same with security teams.

DevSecOps attempts to avoid the bottleneck on existing continuous delivery pipelines by traditional security models. With DevSecOps, agile techniques are used to generate secure code with security procedures which are baked into the development process itself rather than being placed as a 'layer on top.'

There are various DevSecOps tools that are used in different stages of the software development lifecycle. DevSecOps best practices seek to integrate seamlessly into the Continuous Integration and Continuous Delivery pipeline(CI/CD).

Below we will see how DevSecOps tools are used in the software development lifecycle to enhance security. The tools used in various DevSecOps phases are:

DevSecOps tools focus on automated safety analysis in the ‘Build’ stage. This includes automated scanning of third-party libraries being used. Often third-party libraries have vulnerabilities that could compromise the application.

In the next phase the ‘build’ is deployed to a test/staging environment. Tools utilised in this phase must check how the application code interacts with the staging environment and uncover possible security gaps. In this phase, Dynamic Application Security Testing (DAST) tools are used to check for vulnerabilities once the application is functional.

In the release phase of the DevSecOps cycle, the security audits focus on securing the run time environment. Access control features, network firewall behaviour and secret(user authentication) data management is also validated. This is also the phase where penetration testing and vulnerability scanning is done.

The final stage is the deployment phase. This phase involves deploying a secure build of the application after thorough testing has been done. After deployment, scanning of the ports and operating systems of the application and infrastructure is done to ensure that there are no unsecured endpoints.

DevSecOps places a premium on developing and deploying secure software. It is built on the belief that speed and security should not be separated; rather, they should be merged into a new harmonic strategy that aims to find the right balance between the two.

Conclusion

The trends listed above are the ones we expect to see making the most impact in the coming years. While the trends mentioned above may seem DevOps centric, a good SRE will be expected to add these skills to his stock. The future is exciting for DevOps and SRE as we will be seeing a host of new technologies in the near future.

Squadcast is an incident management tool that’s purpose-built for SRE. Your team can get rid of unwanted alerts, receive relevant notifications, work in collaboration using the virtual incident war rooms, and use automated tools like runbooks to eliminate toil.

19