When DevOps Goes Bad

Author: Sambit Ghosh | 8 min read | August 25, 2022

Even for those who work in the industry, the concepts of ‘AppDev’ and ‘DevOps’ can be confusing, especially if their role is in the C-Suite and not the app design lab.

Some think the labels are just different ways of identifying the same activity. Others aren’t concerned about either – they just want their application to function correctly and efficiently. However, the process of ‘developing’ an application (AppDev) is decidedly different from ‘operating’ an application (DevOps). Companies that want to maximize their application investments should be weighing the costs and benefits of outsourcing one process or the other.

Why Does It Matter?

Simply put: No organization can afford a slowdown or the cessation of function altogether when the operations side of an application aren’t tightly keyed into the development side. Examples of how well-intentioned leadership failed to fully realize their DevOps vision offer insights and lessons:

Failing to Embrace a DevOps Culture

In most cases, successful app development and operations occur in both technological and human terms. Each aspect must work in conjunction with the other if the project is to succeed. Technologically, developers can design software to address an infinite number of variables, and human inputs will contribute the user-friendly nuances that will ultimately optimize the software’s function. Failing to include the extensive need for human insight, however, can be costly.

One enterprise learned this the hard way when it launched its app based on the success of its digital tools but without the added oversight of the humans who designed it or would be using it. As the program went live, all it accomplished were the same functions as the legacy tech, so there were no automated quality assurance processes in place, nor did the app work within its production environment. It took several months to fix the technical gaffs and a few more months to (finally) knit together the team that would brainstorm and collaborate on a fully vested application. They had the job of addressing the slow response times, service failures, and other user-facing errors that had caused the company so much embarrassment.

Losing Sight of Availability Concerns

The rush to get an app out into the world can also trigger inadvertent and disastrous accessibility errors. An early (2006) version of SlideShare (now owned by LinkedIn) launched a DevOps model that would facilitate faster response times to best the efforts of its competition. The U.S.-based company split the DevOps team between its San Francisco and New Delhi offices, and the communications between the two required a significant infrastructure to function. To optimize the collaboration and achieve maximum engineer efficiency, leadership elected to open access to the infrastructure as much as possible. Engineers in either office working on different project elements could go in and inspect what the other contributors were doing.

A glitch occurred, however, when one of those developers inadvertently reorganized a MySQL database to accommodate the parameters of his project. He didn’t realize his action also changed the live production environment’s database, which shut down SlideShare altogether.

In its investigation of the incident, the company recognized that too much access was a bad thing when it wasn’t vetted regarding its value to the process. Access to the database itself was helpful, however, because, ultimately, the engineer was able to achieve his programming goal. What was not helpful was using the live database for the experiment when a staging database would have proffered the same result.

Missing the Point of the DevOps Function

One sizeable national enterprise used a series of third-party vendors to achieve the full functionality of its services. It required a network of financial services providers in locations around the country to facilitate its vehicle leasing operations. The recording and tracking of the transactions to ensure their accuracy was, of course, the fundamental element of the applications in use.

However, those operations were NOT the operations the DevOps team programmed into the software. Instead, they tracked server metrics, such as query response times, automation statistics, and service breakdown frequencies. No part of the operations infrastructure was designed to follow what was actually going on inside the third-party vendor systems.

Consequently, the entire nationwide system suffered an outage that took it off-line completely. The gap in services monitoring allowed the failure of a third-party, mandatory validation system to take down the whole network.

The resulting review revealed:

That a ‘sub-par’ (cheap) software contractor had connected all the leasing submissions to the single service that failed, and without any leases going through from anywhere in the country, the company’s cash flow was abruptly cut off.
The monitoring system wasn’t designed to monitor for these types of system failures. The company did have a DevOps team, but that team was not actively monitoring the actual operations of the company’s business.

In its defense, the DevOps team underscored the reality that 99% of software and application issues are coding errors, which justified the agency’s DevOps focus on those. External causes are experienced on a much less frequent basis.

DevOps Failures Continue to Occur

More recent DevOps failures were evident in the handling of the COVID-19 pandemic. Early on, the need to establish and track the efforts of testing sites was viewed as an integral element of containing the virus. However, regardless of the relative maturity of the DevOps process, it took months for globally sited DevOps teams to create an operations capacity that matched their site-finding app’s design capacity.

Other reasons for DevOps failures are equally informative:

Many companies failed to consider the work-life concerns of in-house, DevOps staff. Because the work entails 24/7 coverage, those organizations with smaller staff sizes can add unnecessary strain to their already overworked IT team when adding in DevOps duties.
Other companies fail to fully visualize the all-encompassing capacity of a new app to impact all other corporate functions. Many organizations keep their AppDev and DevOps teams siloed from each other. A lack of communication can doom the app development process because the DevOps team doesn’t contribute to the process.

To address the confusion, many enterprises elect to access the services of AppDev and DevOps professionals such as Datavail. Datavail’s application-focused teams bring decades of experience to both the design and management of their customer’s proprietary applications, usually at less cost than their customers can achieve by hiring new IT staff. If you’re concerned about how well your DevOps infrastructure is protecting your organization, give Datavail a call today.

Contact an Expert »

Blog Author

Sambit Ghosh

A seasoned Senior IT Executive with 20 years’ experience leading cross-functional teams of technical and business experts for resolving complex problems and business challenges through innovation. Recognized for leadership in full P&L Management, Resource Management, Client Relationship Management, Pre-Sales, and Global Delivery Management. Demonstrated ability to focus on high-payoff strategies to achieve immediate bottom-line benefits. Proven track record of success in developing both on-prem and cloud solutions (AWS/Azure based) that improve the efficiency of IT and business operations.