This isn’t a simple question to answer.
First of all, cloud computing is hidden behind a fog of abstraction. Whereas IT could once instrument every element of an application, today applications are like Descartes’ brain in a jar—never quite sure if they’re real, or virtual.
Second, on the surface many service providers’ goals aren’t aligned with those of their customers’. Service providers want to maximize revenues, and want the freedom to do with the underlying infrastructure what they will. That’s how they stay in business and make the most of what they have. Without that freedom, they lose economies of scale and skill. By contrast, customers want special treatment, and instrumentation all the way down the stack.
Third, people don’t really understand metrics well. Despite decades of criticism, we still use averages, even though they hide important fluctuations in service quality that can warn of bigger problems before they become disasters.
There’s a bigger problem here, however. For half a century, IT has been about protecting precious resources. The reason you put up with carrying a stack of punched cards to the basement of the computing building at 3AM was because the mainframe was scarce, and the humans abundant. No more: each of us has three screens, one of which is seldom more than a meter from our bodies at any time.
That means we’re less concerned about the consumption of resources and more concerned about the completion of tasks. We shouldn’t really care if the CPU is idle or maxed out, provided that the user accomplish what they set out to do. Proponents of Service Level Agreements have long known this, but cloud monitoring, hiding behind the fog of virtualization, drives it home hard.
Application Performance Management and Real User Monitoring have long been thought of as “advanced” forms of measurement*. These go beyond up/down metrics or numbers related to utilization, and instead look at the success of the application from the user’s point of view. They’ve often languished somewhere between web analytics (which show you what users did) and synthetic monitoring (which shows you whether the site is working.)
Today, however, the real question is: could they do it, well? There’s great evidence that slow applications undermine productivity, cost money, and cut into revenues. Slow clouds need fixing. To do this, I think we need to go beyond APM, and start with the business problem. Too often, IT professionals start at the bottom and work up. “Server 10 is down, which means the support site isn’t working, which means the phone queue is too long, which impacts our customer satisfaction rating.” They begin with the means, and work back to the end.
Instead, I think we need to step back and look at the business model. From that, we can derive the relevant metrics, and what’s considered an acceptable threshold. Then we can measure against those thresholds, and report on violations. That’s a much more palpable approach to measurement for executives. Starting at the model and working down says we say, “7% of visits need to result in an enrollment for us to meet our monthly target.” From that, we can measure the steps of an enrollment, and their performance against the past or response targets.
When we owned the infrastructure, this was considered progressive. But the fog of cloud monitoring means it’s often the only way we can measure. It lets us size cloud consumption, which in turn lets us define budgets—since with the right architecture, you can have any performance you can pay for. And it leads to good metrics, since it’s focused on rates and exceptions rather than averages.
We’ll be talking about how to measure cloud-based applications at this spring’s Cloud Connect event in Santa Clara. In fact, we have a whole track of content dedicated to it, including sessions on WAN, application delivery networks, load-balancing, and choosing the right metrics. Clouds are the IT of abundance, and they fundamentally change how we measure applications. Let’s figure out how.