Three Crucial Ways Machine Learning for Monitoring Automation Advances IT

by Richard Whitehead

The increase in complexity, along with the accelerating rate of change in IT are the two factors that make it so difficult for IT operations teams to identify issues without having deep knowledge of a particular setup. This complexity also means that technical people are staring at incomplete screens, showing only a part of the overall situation: one layer of the stack, one area of the network, or one business service.

It has been said that the probability of an error is O(εN), where N is the number of times a human being has to intervene. As complexity and pace of change increase, however, the need for human intervention grows too, and with it the probability of somebody missing something or misinterpreting a signal.

So how can IT operations teams gain a 360-degree understanding of what’s taking place across the entire IT production stack? The answer lies within monitoring automation. A fully automated, machine learning approach can take care of routine tasks, allowing humans to deliver more value elsewhere.

Here are three examples of how machine learning applied to monitoring automation is advancing IT:

#1) Automation applies algorithmic approaches designed to detect subtle patterns that humans must learn to identify over time. Machine learning can quickly adapt to the nuances of a specific environment, mapping out what is routine behaviour and what is something out of the norm. Automation finally allows IT operations teams to break free from decades of monitoring and the “monitor everything” approach, which thought leaders and vendors have been preaching for years. Simply put, machine learning algorithms can analyze in milliseconds what would normally take humans day to identify.

#2) Automation is the only way to gain a complete picture of what is going on across the entire IT production stack. Today, as some of you may know, there is a huge video wall at the head of every NOC, with casually tiled windows from a dozen or more different products, each monitoring one particular aspect of IT. At least one of these is a data collector for lower-level tools, aggregating and filtering data in an attempt to control the flood of alerts from the monitors. The challenge companies are running into now though, is that these filter-based systems have many of the same problems with complexity and rate of change that humans encounter. Adding a new data feed is complex and time-consuming, while the rules and filters themselves are brittle and require frequent hands-on configuration to deal with changes in the environment. An automated solution quickly provides IT operations teams with situational awareness, allowing these professionals to understand what is actually taking place across the entire IT environment.

#3) Automation keeps up with the pace and volume of change in IT. IT is changing at an increasing pace that is being driven by trends such as virtualisation at all levels of the stack, containerisation, continuous deployment, and ever-higher levels of automation of deployment and configuration activities. The only way to keep up with constant automated change is to automate the analysis as well. A machine learning based approach works much better than using rules or models to detect anomalies.

Yes to Monitoring Automation
The conversion to alogrithmic event management does not have to be considered a big-bang approach. In IT, that sort of thing usually ends in tears. Luckily, the rewards of automated analysis are often highest when there is the most chaos to be processed. The more diverse sources of data present, the more accurately machine learning algorithms can assemble the truth of the situation. They can even layer over existing aggregators and further process their output, while also making it easy to add feeds that had not been added to the existing aggregator before, providing a bridge to the future.

Instead of IT operations teams developing eye-strain watching graphs all day, experienced operators can now get on with providing value to their employers, secure in the knowledge that algorithms will alert them if there is anything that deserves their attention. Because this approach correlates alerts from many event streams at different layers of the application stack, it is also able to identify situations that span the borders of individual technological domains, bringing together experts from different disciplines to work together effectively to quickly resolve situations.

Keep in mind, however, that organisations seeking a new approach often struggle to find the time to seek out next-generation technology. They are too busy fighting the fires to worry about longer-term strategic issues. Unfortunately, this only results in a recipe for failure. Business-as-usual approaches simply cannot keep up, and it is only a question of time when something big will be missed, resulting in major service disruption. IT operations teams must now take the next step and move towards a fully automated, machine learning based approach to IT in order to advance in new era of operational intelligence.

Richard Whitehead is the Chief Evangelist at Moogsoft.


Leave a Reply

WWPI – Covering the best in IT since 1980