Main page | Projects | Failure prediction component for data storage systems

Failure prediction component for data storage systems

A hardware-software component with model-diagnostic software based on machine learning, designed to detect exceptions and abnormal behavior, as well as predict and prevent failures in data storage systems.

Task

Data storage systems are one of the key elements of companies’ IT infrastructure. Failure to process or store data can lead to damage or loss of data, and cause the system owner serious image or financial damage.

The developers’ task is to create model-diagnostic software using machine learning algorithms in order to timely detect abnormal system behavior that could cause malfunction.

Solution

The practical result of the research and development was a pilot industrial component to build into the software and hardware architecture of the TATLIN data storage platform for the laboratory’s industrial partner KNS Group LLC (YADRO). Completion is scheduled for the end of 2019.

The approach is based on machine learning and allows to identify anomalies and predict critical situations that are not detected by integrated methods for processing errors and failures in software and hardware environment.

To train the algorithms, we used both authentic statistics on the operation of various configurations of data storage systems from the TATLIN product portfolio and data modeled using a storage simulator computer program. The failure prevention system identifies problem situations on the basis of combinations of current monitoring data and forecast results.

The developed innovative solution will provide for:

  • Prevention of critical situations, i.e., degradation of the performance and failure of the data write/read service of storage systems.
  • Reduction of labor costs for the collection and processing of monitoring data.
  • Reduction of fault detection time.
  • Optimization of the cost of service and reduction of the total ownership cost of storage systems.
  • Enhancement of storage reliability.
  • Elimination of financial or reputational risks of the company-owner due to the loss or inaccessibility of data.

Details

From the viewpoint of diagnostics, three main types of failures were considered for any component of the storage system: failure of the hardware component, non-performances of its functions, and the need of replacement; an error when it retains partial operability, and a predicted failure when a component of the system works without external symptoms of a failure but shows some signs that a failure might occur. To diagnose and predict the occurrence of various types of failures based on current monitoring data, the applied algorithms use models trained on a set of accumulated historical data on the functioning of storage systems and anomaly detection algorithms determining the deviation from the normal operation mode of storage systems.

In creating the software package, various modeling methods were used, including simulation and system-dynamic, with the construction of ontological and graph models, as well as machine- learning algorithms for solving classification problems and identifying anomalies.

As part of the hardware and software component integrated into the storage system, this software predicts critical situations such as degradation of the performance and the failure of the data write / read service; it helps to quickly identify malfunctions and more effectively respond to them by making better informed decisions on choosing the necessary measures.

In the course of the project:

  • An in-depth analysis of the subject area and existing solutions in the field of diagnostics and management of storage systems has been performed.
  • Simulation and system-dynamic modeling of storage systems and their individual components was carried out.
  • A set of algorithms for diagnosing, predicting and preventing failures has been developed.
  • Research tests on both the simulation bench and the storage system have been carried out.

The developed software package includes the following main systems:

  • Failure prevention software package that performs the tasks of collecting, processing and interpreting parameters that describe the functioning of storage systems, diagnosing, predicting and preventing failures
  • The software package for simulating the functioning of storage systems designed to develop and debug simulation models, teach algorithms based on machine-learning tools and conduct research tests; simulate the functioning of storage systems in various modes.

Co-executors:

SPbPU: development of a system-dynamic model and algorithms for diagnosing and preventing failures in storage systems; software development for diagnostics, forecasting and prevention of failures in storage systems.

HSE: development of simulation models and algorithms for diagnosing and predicting failures in storage systems.

KNS Group LLC: development of hardware platform and system software for storage systems.

Technical advantages:

  • Improving the efficiency of monitoring storage parameters
  • Predictability of the occurrence of malfunctions by determining the onset of the precautionary state of storage and its components
  • Reducing the decision-making time in the event of failures during the operation of storage systems under various operating conditions and the influence of external factors, e.g., air temperature, relative humidity, pressure and vibration.

Technologies

Software programming languages and frameworks Go, C++
OS Sles 12sp3, Windows 10
Architectures x86, POWER8
CVS Git
DBMS/DB RocsDb, DGraph
IDE GoLand, Visual Studio Code

Intellectual Property

Program simulating the functioning of the PCI-Express data storage controller, the hardware component of the data storage system. Certificate of state registration of a computer program. Uspensky M.B., Arzymatov K. Reg. No: 2018665160. Date of registration: 03.12.2018. Copyright holder: KNS Group LLC.


Program for collecting data storage system parameters. Certificate of registration of a computer program. Uspensky M.B., Petrov V.D., Sochnev A.V., Pustovetov V.I. Reg. No: 2018660284. Date of registration: 21.08.2018. Copyright holder: KNS Group LLC.


Program simulating the functioning of the data storage controller, the hardware component of the data storage system. Certificate of registration of a computer program. Uspensky M.B., Belavin V.S. Reg. No: 2018665676. Date of registration: 06.12.2018. Copyright holder: KNS Group LLC.


Program for collecting and displaying climatic parameters of data storage systems. Certificate of registration of a computer program. Uspensky M.B., Smirnov S.V. Reg. No. 2019614476. Date of registration: 05.04.2019. Copyright holder: KNS Group LLC.


Publications

  1. Mamoutova O.V, Uspenskiy M.B., Smirnov S.V, Bolsunovskaya M.V. Ontological approach to automated analysis of enterprise data storage systems log files // Acta Polytechnica Hungarica. 2021. V. 18. No. 9. P. 27-47. DOI: 10.12700/APH.18.9.2021.9.3.
  2. Uspenskij M.B. Log mining and knowledge-based models in data storage systems diagnostics // E3S Web Conf. 2019. Т. 140. № 03006. DOI: 10.1051/e3sconf/201914003006.
  3. Mamoutova O.V, Uspenskiy M.B., Sochnev A.V., Smirnov S.V., Bolsunovskaya M.V. Knowledge based diagnostic approach for enterprise storage systems // 2019 IEEE 17th International Symposium on Intelligent Systems and Informatics (SISY). Subotica, Serbia: IEEE, 2019. P. 207–212. DOI: 10.1109/SISY47553.2019.9111617.
  4. Uspenskij M.B., Makarov A., Sochnev A., Shirokova S.V, Petrov V. Development of a Software Structure for Monitoring the Working Capacity of the Data Storage System for Predicting Failures and Preventing Critical Situations // Education excellence and innovation management through vision, Soliman, KS, 2020. P. 8508-8514.
  5. Bolsunovskaya M.V., Shirokova S.V., Loginova A.V. Development of Hardware and Software Complex for Predicting Failures in Data Storage Systems of Smart Cities // Proceedings of the 33rd International Business Information Management Association Conference, IBIMA 2019: Education Excellence and Innovation Management through Vision 2020. P. 5165-5172.
  6. Gintciak A.M., Bolsunovskaya M.V., Redko S.G. Comparative analysis of approaches to the employees distribution among the organization’s projects // Proceedings of the 33rd International Business Information Management Association Conference, IBIMA 2019: Education Excellence and Innovation Management through Vision 2020. 2019. P. 4690-4694.
  7. Uspenskij M.B., Shirokova S.V., Mamoutova O.V., Zhvarikov V.A. Complex expert assessment as a part of fault management strategy for data storage systems // Lecture Notes in Networks and Systems. 2020. V. 95. P. 592-600. DOI: 10.1007/978-3-030-34983-7_58.
  8. Mamoutova O.V., Shirokova S.V., Uspenskij M.B., Loginova A.V. The ontology-based approach to data storage systems technical diagnostics // E3S Web Conf. 2019. V. 91. № 08018. DOI: 10.1051/e3sconf/20199108018.
  9. Bolsunovskaya M.V., Shirokova S.V., Loginova A.V., Uspenskiy M.B. Development of tools for improving the data storage systems reliability as a part of digital transformation strategy // IOP Conf. Ser.: Mater. Sci. Eng. 2020. V. 940. №. 012010. DOI: 10.1088/1757-899X/940/1/012010.
  10. Bolsunovskaya M.V., Shirokova S.V, Loginova A.V. State of current data storage market and development of tools for increasing data storage systems reliability // E3S Web Conf. 2019. V. 135. P. 04076. DOI: 10.1051/e3sconf/201913504076.

This work is supported by the Ministry of Science and Higher Education of the Russian Federation within the framework of the federal target program “Research and Development in Priority Areas for the Development of the Russian Science and Technology Complex for 2014-2020”.

Agreement on the provision of subsidies between the FSAEI HE “SPbPU” and the Ministry of Science and Higher Education of the Russian Federation dated 03.10.2017 No. 14.581.21.0023.

Unique identifier: RFMEFI58117X0023.

Project team

  • Project supervisor: M. Bolsunovskaya
  • Project manager: A. Kouzmichev
  • Development team leader: M. Uspensky

Industrial partner

KNS Group LLC (YADRO)

Associate executor

National Research University “Higher School of Economics”