Complex Event Processing (CEP) should be a key technology for any monitoring solution that wants to provide advanced diagnostic or near predictive behavior. As usually technology is a really important point but not the only one.
Other really key aspect is how you model the system/platform you are trying to monitor, and how you map and adapt the underlying technology to match the modeled universe. For instance two monitoring solutions could both use CEP in order to correlate events and monitor 3-tiers application , one modeling primary events as those related to IM (Infrastructure Management) such us CPU, Memory, and devices availability, and the other modeling events as single-transaction related metrics, for instance, providing number of managed exceptions within a particular application server, or including the average response per transaction and node.
With the first approach you will be able to have a fuzzy view of your platform and you will get stuck to classical IM information full of unmixed silos. With the second one you will be able to have a detailed view of your business transactions flow across your infrastructure and the impact and relevance of all the involved elements. From the technologies used point of view both solutions are identical, from the functional advantages, business benefits and triaging capabilities both are totally different.
Synergies between CEP and APM modeling the main events based on transaction metrics when crossing physical or logical borders is an approach that drives a lot of benefits to monitoring scene. Let’s go with a simple sample: imagine every single click you make from the web point of view were automatically monitored and tracked across all your platform. Every time any system involved in your transaction responds to a calling system, an event is generated including the called system name, the response time for that particular piece of call, its status, and some other information… Then imagine a central place where all these events are aggregated and processed in real time, computing aggregate functions such us average response time, standard deviation and all within a temporal sliding window this is where CEP plays a key role correlating those events and computing key performance indicators. As you can imagine you can easily obtain metrics such us average response time from any particular node in your network and not as silo but as a piece within high level concepts such us Applications or Services.
Availability to correlate those events within complex streams boosts the system diagnose capabilities and makes the monitoring solutions working as this as pretty real time diagnose tool, bringing near predictive capabilities if used concurrently with a dynamic tight thresholding strategy such us SUS.