A hardware-software component with model-diagnostic software based on machine learning, designed to detect exceptions and abnormal behavior, as well as predict and prevent failures in data storage systems.
Data storage systems are one of the key elements of companies’ IT infrastructure. Failure to process or store data can lead to damage or loss of data, and cause the system owner serious image or financial damage.
The developers’ task is to create model-diagnostic software using machine learning algorithms in order to timely detect abnormal system behavior that could cause malfunction.
The practical result of the research and development was a pilot industrial component to build into the software and hardware architecture of the TATLIN data storage platform for the laboratory’s industrial partner KNS Group LLC (YADRO). Completion is scheduled for the end of 2019.
The approach is based on machine learning and allows to identify anomalies and predict critical situations that are not detected by integrated methods for processing errors and failures in software and hardware environment.
To train the algorithms, we used both authentic statistics on the operation of various configurations of data storage systems from the TATLIN product portfolio and data modeled using a storage simulator computer program. The failure prevention system identifies problem situations on the basis of combinations of current monitoring data and forecast results.
The developed innovative solution will provide for:
From the viewpoint of diagnostics, three main types of failures were considered for any component of the storage system: failure of the hardware component, non-performances of its functions, and the need of replacement; an error when it retains partial operability, and a predicted failure when a component of the system works without external symptoms of a failure but shows some signs that a failure might occur. To diagnose and predict the occurrence of various types of failures based on current monitoring data, the applied algorithms use models trained on a set of accumulated historical data on the functioning of storage systems and anomaly detection algorithms determining the deviation from the normal operation mode of storage systems.
In creating the software package, various modeling methods were used, including simulation and system-dynamic, with the construction of ontological and graph models, as well as machine- learning algorithms for solving classification problems and identifying anomalies.
As part of the hardware and software component integrated into the storage system, this software predicts critical situations such as degradation of the performance and the failure of the data write / read service; it helps to quickly identify malfunctions and more effectively respond to them by making better informed decisions on choosing the necessary measures.
In the course of the project:
The developed software package includes the following main systems:
Co-executors:
SPbPU: development of a system-dynamic model and algorithms for diagnosing and preventing failures in storage systems; software development for diagnostics, forecasting and prevention of failures in storage systems.
HSE: development of simulation models and algorithms for diagnosing and predicting failures in storage systems.
KNS Group LLC: development of hardware platform and system software for storage systems.
Technical advantages:
Software programming languages and frameworks | Go, C++ |
OS | Sles 12sp3, Windows 10 |
Architectures | x86, POWER8 |
CVS | Git |
DBMS/DB | RocsDb, DGraph |
IDE | GoLand, Visual Studio Code |
Program simulating the functioning of the PCI-Express data storage controller, the hardware component of the data storage system. Certificate of state registration of a computer program. Uspensky M.B., Arzymatov K. Reg. No: 2018665160. Date of registration: 03.12.2018. Copyright holder: KNS Group LLC.
Program for collecting data storage system parameters. Certificate of registration of a computer program. Uspensky M.B., Petrov V.D., Sochnev A.V., Pustovetov V.I. Reg. No: 2018660284. Date of registration: 21.08.2018. Copyright holder: KNS Group LLC.
Program simulating the functioning of the data storage controller, the hardware component of the data storage system. Certificate of registration of a computer program. Uspensky M.B., Belavin V.S. Reg. No: 2018665676. Date of registration: 06.12.2018. Copyright holder: KNS Group LLC.
Program for collecting and displaying climatic parameters of data storage systems. Certificate of registration of a computer program. Uspensky M.B., Smirnov S.V. Reg. No. 2019614476. Date of registration: 05.04.2019. Copyright holder: KNS Group LLC.
This work is supported by the Ministry of Science and Higher Education of the Russian Federation within the framework of the federal target program “Research and Development in Priority Areas for the Development of the Russian Science and Technology Complex for 2014-2020”.
Agreement on the provision of subsidies between the FSAEI HE “SPbPU” and the Ministry of Science and Higher Education of the Russian Federation dated 03.10.2017 No. 14.581.21.0023.
Unique identifier: RFMEFI58117X0023.