Realscale is a philosophy for architecting cloud native applications to scale up to 1M visits/day and 1000 requests/sec, typically requiring 10-200 server instances and a variety of internal services.

Managing Cloud Native Infrastructure

In the early stages of an application, there may be only a handful of configuration steps required to rebuild a server. However, a Realscale architecture will eventually reach a point that the number of configuration steps exceeds the ability for any one team member to be able to replicate manually. As the team reaches this point, it becomes necessary to automate infrastructure management to ensure consistency.

As teams begin to automate their infrastructure, they commonly address the need through one or more hand-written scripts. As the complexity of the application grows, these scripts will become harder to manage and fragile. This results in teams avoiding infrastructure changes for fear of breaking the fragile scripts, leading to workarounds or poor application design rather than making the necessary infrastructure changes.

Fortunately, all of this can be avoided through the use of three key principles: 1) automation, 2) infrastructure as code, and 3) team ownership. Let’s look at each one to understand how we can avoid these issues as our applications continue to mature.

Server Configuration Automation: Testable, Repeatable Scripts and Recipes

One of the first problems encountered by teams is the inability to reproduce the same production server environment for local development, integration, QA, and pre-production. Differences in configuration can result in “it works for me on my box” responses from developers, or QA testing that passes user acceptance testing but fails in a production environment. This is especially a problem if we are treating our cloud servers as ephemeral, starting and stopping them as needed rather than treating them as long-lived servers. Automating the configuration of servers is critical for achieving a Realscale architecture.

What server configuration steps should you automate?

Any step that would otherwise require a developer or operations team member to execute manually on a server:

  • The installation of software and libraries (e.g. databases such as MySQL or PostgreSQL, Nginx, and shared libraries such as libxml2 and imagemagick)
  • Installation of configuration files (e.g. my.conf, pg_hba.conf, nginx.conf)
  • Network configuration (e.g. custom routing and firewall settings)
  • Application configuration (e.g. API endpoint URLs, service locations, outgoing mail server details)

Tools for server configuration management and automation

There are a variety of tools available to help make server configuration and automation easier across any number of different Linux-based distributions:

Each one of these tools have strengths and weaknesses. We suggest looking at each one to determine which one fits your team’s requirements and preferences. We also recommend using a tool such as Vagrant to test these scripts prior to deploying them into a cloud environment. Vagrant is also useful for setting up local development environments to match as much of the cloud infrastructure as possible.

Cloud Infrastructure as Code

Cloud infrastructure requires more than just servers. Network policies, identity and access management, shared filesystems, databases, messaging, and other components are required to realize today’s cloud native architecture. If left to manual steps, a single infrastructure setting may be incorrect or missing, causing the application to fail.

Version your infrastructure, not just your server configuration

Cloud infrastructure must be treated like application source code. The specific infrastructure needs for your application should be versioned in a source code repository, such as git, and routinely reviewed by the team. When a new environment is required (e.g. production, pre-production, UAT, etc), the cloud infrastructure scripts are executed to construct an exact replica of the desired environment. Infrastructure as code prevents mistakes in configuration or missing resources that can take hours or days to debug by hand.

Tools for cloud infrastructure automation

Tools such as AWS CloudFormation, Google Instance Templates, and Azure Template Deployments allow for the scripting of a complete cloud environment within the respective cloud service provider. Cross-vendor solutions such as Terraform and Salt Cloud provide a vendor-independent method of constructing a complete set of cloud resources.

Collective Team Ownership (Devs and Ops Teams)

Traditional server configuration management is performed by the operations team only, leaving out the development team. Over the last few years, there has been a cultural shift that encourages developers and operations teams to collaborate on infrastructure management. This creates a team-based ownership of the infrastructure, rather than an “us vs. them” approach.

How to encourage collaboration between devs and ops teams

To pursue a collective team ownership approach to cloud infrastructure, we recommend focusing on the following steps:

  1. Clarify the purpose – ensure that everyone understands and agrees to the business value for any new requirements
  2. Start from ideal – discuss what the ideal solution should be, allowing for necessary adjustments due to time and resource constraints
  3. Build the action plan – brainstorm the final solution and identify the action plan necessary
  4. Implement the plan – throughout the implementation process, involve both developers and ops to ensure full team collaboration
  5. Review the results – after completion, review the result and find ways to improve the infrastructure and application in the future

Related Articles