Deployment Runbook: 8 Reasons Why You Still Need Them

8 min readJan 30, 2018

It seems there are never-ending stories about how various development teams have implemented really impressive Agile or DevOps processes that take developed code from the idea state clear through to production in a matter of days, or even hours. These model development teams utilize a wide variety of tools to automate various time-sucking and labor-intensive tasks. The end result is the seemingly perfect picture of what a development organization needs to look like.

Sure, those tools and processes can be implemented into individual teams with relative ease. But we ask ourselves, could efficiency like this ever be a viable reality at the enterprise level, with dozens, or even hundreds of development teams scattered throughout the globe? Where the environments are high-risk and incredibly complex with thousands of inter-dependencies. Where a single deployment event can span days or even weeks and can involve updates to hundreds of different applications. The obvious question we ask is, “Can this level of efficiency really be had at an enterprise level?”

Identifying the Risk

To effectively answer this question, we need to better understand the risk we’re talking about. Enterprise organizations have challenges that are unique to their scale. Their customers span every customer market segment, including individuals, pets, health care, governments, travel, utilities, and much more. The risk they face on every deployment is that if there are outages as a result of a cutover issue, the impact has the very real potential to mean life or death for some of their customers, and at a minimum can easily become incredibly costly to the company.

There have been several instances in the news where these types of issues directly resulted in the loss of tens of millions of dollars, with some reported losses reaching into the billions. It’s the unspoken goal of every development team to avoid being in the news for the wrong reasons.

Runbooks: The Challenge in the Vision

In order to effectively implement process improvements and mitigate the risk associated with these highly complex deployments of scale, we first need to identify what those bottlenecks and risks are. While every organization has a variety of differing needs and priorities, there are certain issues that we’ve found are fairly universal. They are:

Planning Complexities. The interdependencies of the enterprise environment mean that when a tier one application is updated, those interdependent tier two applications often need to be updated as well. Then, because the tier two applications are updated, the tier three applications may also need to be updated, and so forth. Managing the madness of these interdependencies is not for the faint of heart and can take weeks and months of planning.
Collaboration. When dealing with a network of development teams, contractors, stakeholders, and impacted customer groups, the challenge is to coordinate the planning of each of the different entities. The scheduling and sequencing of the deployment plan needs to make sense, not only for the enterprise but also for the individual teams. This can be a monumental task when you factor in that each team has a completely different set of deployment tasks, interdependencies, requirements and procedures that only that one team knows about. After all, for most of these large organizations, it’s not like you can just get everyone with a role in deployment into a conference room to work it out in an hour or two.
Approvals and Governance. The complexities of system interdependence and the high-risk nature of enterprise environments and operations mean that there are typically a lot of cross-functional stakeholders that participate in the oversight to ensure system and operational stability. All this means that there is a never-ending list of people that participate in the governance and whose approvals are needed before any deployment can move forward. The challenge is to get these approvals in place before the newly developed code becomes obsolete. That means that each of these individuals needs to understand at least the basics of the software development lifecycle, including the checks, gates and general methodology being used by their respective development teams. It seems like all too often we catch ourselves saying things like, “I’m sorry, but the developers won’t be shipping code directly into production. There’s a process that needs to be followed.”
Scheduling. Scheduling for a single development team can be enough of a challenge. Now compound that by the dozens of development teams and potentially hundreds of applications that not only have their own deployment to manage but also need to interweave and coordinate their deployment schedules to ensure that they don’t bring the systems and general operations to a screeching halt. Then, as if that wasn’t challenging enough, this all has to be coordinated and executed as quickly as possible to minimize blackout periods and system downtime.
Orchestrating. The conductor of an orchestra has the benefit of working with each of the sections of the orchestra individually for days or weeks before bringing them together to practice their ensemble, only to practice several more times before a live performance. Deployment managers have no such luxury. Because it takes so much time and money to organize and orchestrate a rehearsal deployment with each of the various participants, they typically will get only one shot to execute the cutover; with the high-stakes objective being that their performance has to be absolutely flawless the first time. System crashes, extended downtime, production-side issue resolution, and deployment reversals are the things that nightmares are made of.
Failure Remediation Planning. If a deployment goes sideways and creates problems of various levels of severity in the production environment, what then? Can it be backed out? Can the previous release of code be re-installed? What does that plan look like? This is where risk mitigation experts go bonkers trying to calculate impact and recovery scenarios.
Executing. With a nervous gulp and a leap of faith, you press the button to kick off the deployment plan execution. Will everything go as planned? What if a critical step was left out? What if something just doesn’t work as expected. How will your teams communicate the completion of one milestone, and the beginning of work on the next? How can you track the status of where things are at? How would you know if there was an issue or delay, and how would you coordinate the change in the schedule with the rest of the impacted teams? Then, as if those questions weren’t bad enough, what if that one senior executive happens to stop by in the middle of the deployment and asks those seemingly simple questions: “What’s the status of the deployment?” or “How much longer will it take?” Would it be possible to accurately track deployment status and answer those questions?
Audits. With so much at risk with these large-scale deployments, there are often high standards of regulatory compliance that need to be met and tracked to ensure that system protocols are followed. But in order to have reliable audit trails, the data being tracked needs to be 100 percent accurate and complete. It also needs to be in a format that minimizes human error and the possibility of tampering. The challenge is that if everything is tracked on a spreadsheet, SharePoint or Word document, how reliable can those audit trails really be?

The Enterprise Runbook Solution

While these challenges seem pretty daunting, the solution is easier than you might think. The key to creating an efficiently run DevOps organization and having fast, reliable cutovers in a global enterprise isn’t found in the use of team-specific automation tools. Sure, they have their place and can certainly improve performance and efficiency at the team level. But at the enterprise level, where there is a wide variety of tools, teams, methodologies, and environments, the key is a solution that brings all of those puzzle pieces together into a cohesive solution that functions smoothly and reliably like a well-oiled machine. The secret sauce for efficient enterprise deployments is efficient collaboration, communications, orchestration and reliable audit trails. This is where enterprise solutions like Plutora Release are invaluable.

Plutora Release has been designed from the ground up specifically for the complexities of the enterprise. Yes, runbooks continue to be a critical document to manage deployments in any organization. They are essential to plan and orchestrate the cutover. But at the enterprise level, the traditional spreadsheet or SharePoint versions of the runbook just aren’t efficient. That’s why Plutora Release integrates the runbook into the cloud. It allows each team to plan and coordinate and get necessary approvals. They then store their respective runbook segments in a Deploy Plan Library where the Deployment Coordinator can pull those plans together into a single cohesive master runbook. Team and task coordination, collaboration and getting necessary approvals are significantly streamlined because everyone has access to the same version of the runbook. If your deployment is particularly critical, sensitive, or complex, you have the ability with Plutora Release to run deployment rehearsals to work out any kinks before the go-live cutover with minimal impact to the teams and zero impact to operations.

When it’s time to execute, Plutora Release manages handoff communication from one task to another automatically. For those tasks that are fully automated, they can be fully integrated and triggered automatically, in perfect sync with the runbook story. It tracks real-time progress, task level status, the health of the cutover process and provides real-time metrics and dashboards to stakeholders. When issues do arise, resulting in a delay or an on-the-fly adjustment to the runbook, downstream communication to the other teams is easy. So, the next time that senior executive drops by during deployment weekend and asks questions like “What’s the status of the deployment?” or “How much longer will it take?”, the answers will be right at your fingertips.

Because every element of deployment tasks, time stamps and issues are tracked and stored automatically using enterprise-grade security, audit trails are reliable and rock solid. This eliminates the possibility of someone accidentally, or even intentionally recording incorrect data. This data also enables a productive discussion around post-cutover evaluations and performance improvements. And with all of those child and parent runbooks being stored in the Deployment Plan Library, they can be easily reused and refined with each iteration increasing speed and reliability with every rehearsal and execution.

The focus of task level automation is still the key to efficiency at the team level. But at the enterprise level, the secret sauce to a successful enterprise Agile or DevOps initiative is a keystone solution which effectively tracks and coordinates each of the development teams and their never-ending array of unique needs, processes and scenarios, creating a single well-planned runbook, efficient orchestration, real-time status and health tracking, and simply put, a single source of truth.

Request your Plutora demo here.

Deployment Runbook: 8 Reasons Why You Still Need Them

Identifying the Risk

Runbooks: The Challenge in the Vision

The Enterprise Runbook Solution

Written by Plutora