Martin Molan is the Chief AI engineer at Comtrade 360, overseeing the design and development of AI-related research initiatives and solutions. Ten years ago, he began his career with Comtrade Group as an intern. He first became a software engineer and eventually the machine learning engineer at Comtrade Gaming. He led the development of several AI solutions, most notably an intelligent bonus advisor and a real-time platform monitoring tool. When Comtrade 360 was established, he first took on the responsibility of an AI consultant before becoming the chief AI engineer. Currently, his role and research focuses on anomaly detection, anticipation, and digital twin creation methodologies. Besides Comtrade, he has collaborated as a researcher with CERN openlab, UCL center for AI, UNESCO International Research Center on Artificial Intelligence, University of Bologna and CINECA, Italy.
AI-Powered Monitoring of Cloud Applications
- Stopnja 400
-
Datum
ponedeljek
26. september 2022 10:45
Detecting an operation that deviates from the regular operation is crucial in reducing the system's downtime and increasing its overall availability. A data-driven approach to anomaly detection is based on training a particular class of neural networks called autoencoders. Traditionally, these neural networks (autoencoders) are trained semi-supervised, necessitating an accurately labeled training set. Accurately training sets (precise data about anomalies) are seldom available and represent a significant barrier to deploying data-driven anomaly detection. RUAD approach, developed by researchers from the University of Bologna and ETH, is the first modeling approach that has the potential to be trained even if the data about the anomalies is not available (in an unsupervised manner). In this presentation, we will show how we have taken the RUAD approach and adopted it in monitoring online retail platforms running on the cloud (MS Azure). To our knowledge, this is the first commercial deployment of data-driven, unsupervised anomaly detection for cloud applications. We have developed and deployed a system that automatically monitors the overall health of an online retail platform and automatically alerts the system administrators in the case of reduced availability. Data collected from MS Azure is submitted to RUAD, which analyses the data in real time and provides feedback to the system administrators. RUAD network was trained and is deployed on MS Azure. Deployment of the anomaly detection solution, which was trained without an accurate dataset of anomalous events, significantly reduced the time from the occurrence of the anomalies to the response from the support personnel. This, in turn, reduced the overall downtime and increased the platform's availability. The presented methodology applies to many cloud applications beyond online retail.