Dataset: Rare Event Classification in Multivariate Time Series
By: Chitta Ranjan, PhD, Director of Science, ProcessMiner, Inc.
<<Download the free book, Understanding Deep Learning, to learn more>>
A real world dataset is provided from a pulp-and-paper manufacturing industry. The dataset comes from a multivariate time series process. The data contains a rare event of paper break that commonly occurs in the industry. The data contains sensor readings at regular time-intervals (x’s) and the event label (y). The primary purpose of the data is thought to be building a classification model for early prediction of rare event. However, it can also be used for multivariate time series data exploration and building other supervised and unsupervised models.
A multivariate time series (MTS) is produced when multiple interconnected streams of data are recorded over time. They are commonly found in manufacturing processes that have several interconnected sensors collecting the data in over time. In this problem, we have a similar multivariate time series data from a pulp-and-paper industry with a rare event associated with them. It is an unwanted event in the process — a paper break, in our case — that should be prevented. The objective of the problem is to
- predict the event before it occurs, and
- identify the variables that are expected to cause the event (in order to be able to prevent it).
We provide a data from a pulp-and-paper mill. An example of a paper manufacturing machine is shown above. These machines are typically several meters long that ingests raw materials at one end and produces reels of paper as shown in the picture.
Several sensors are placed in different parts of the machine along its length and breadth. These sensors measure both raw materials (e.g. amount of pulp fiber, chemicals, etc.) and process variables (e.g. blade type, couch vacuum, rotor speed, etc.).
Paper manufacturing can be viewed as a continuous rolling process. During this process, sometimes the paper breaks. If a break happens, the entire process is stopped, the reel is taken out, any found problem is fixed, and the production is resumed. The resumption can take more than an hour. The cost of this lost production time is significant for a mill. Even a 5\% reduction in the break events will give a significant cost saving for a mill.
The objective of the given problem is to predict such breaks in advance (early prediction) and identify the potential cause(s) to prevent the break. To build such a prediction model, we will use the data collected from the network of sensors in a mill. This is a multivariate time series data with break as the response (a binary variable).
Refer to this paper for more details on the data, problem, and the link for download.