Power Consumption Predictive Analytics and Automatic Anomaly Detection Based on CNN-LSTM Neural Networks

in a group of data compared to the complete dataset. Contextual anomalies arise when there is an anomaly in the dataset once combined with the context.


I. INTRODUCTION
Power systems have evolved rapidly in recent decades.With the number of household appliances, electric vehicles, and many popular electric motors, the demand for electricity is gradually increasing.Based on statistical data, it can be observed that residential and commercial structures contribute to approximately 60% of the world's total electricity usage [1].The power supply system has undergone advancements and has become more complex and smart.Additionally, there has been the implementation of sophisticated information transmission technology, which has made grid processing more secure and convenient [2].Predictive methods reliant on data to forecast near-future energy consumption, using historical data as their foundation, constitute a crucial element within the Building Information Modeling (BIM) procedure [3].
In addition, power consumption in daily life is also highly variable and difficult.For example, power consumption varies greatly with season, temperature changes, number of people, and weekday and weekend activities [4].Therefore forecasting electric energy consumption becomes a challenge in predictive analytics.It's crucial to identify unusual power usage since anomalies in electricity load may happen concurrently due to various reasons such as human errors like neglecting to switch off electrical devices, malfunctioning of electrical equipment, unauthorized usage of electricity, and more, which can lead to significantly higher electrical costs than usual.
Automatic anomaly or abnormality detection can increase user's awareness to save more energy, prompt users to find faulty appliances, change bad power consumption patterns, and reduce energy consumption costs for users, and promote safety awareness of power consumption.The most important factor is being able to identify the cause of the theft [5].As per the research findings, energy losses, primarily due to electricity theft, make up approximately 50% of total losses in certain developing nations [6], and the practical solution to address this issue lies in the effective implementation of automatic anomaly detection technologies.
There exist three categories of anomalies: contextual anomalies, aggregate anomalies, and point anomalies.Point anomalies happen when there is an extreme high or low point in the data when compared to the rest.Collection anomalies, on the other hand, occur only in datasets with correlations between them and refer to anomalous phenomena in a group of data compared to the complete dataset.Contextual anomalies arise when there is an anomaly in the dataset once combined with the context.[7].
The combination of Convolutional Neural Networks The organization of this paper is as follows: In Section 2, we review prior research concerning power consumption prediction and systems for anomaly detection.Section 3 provides a comprehensive account of the dataset used in our experiment, encompassing data preparation, model implementation, and the methodology employed.In Section 4, we evaluate the model's predictive capabilities and its effectiveness in anomaly detection, employing a variety of models for comparison.Finally, a summary of the paper is offered in the concluding section.

II. RELATED WORKS
Over the past decades, the development of deep learning has grown rapidly, making it more accessible and applicable in various fields.Using deep learning to predict power consumption and detect anomalies in energy systems is very interesting.Many studies have been conducted by researchers in this field to solve various problems in predicting electricity usage.A custom deep learning application was created by [8] to forecast household load, which showed promising results.The approach was specifically tailored for utilization within the Tensorflow deep learning platform and is slated for evaluation with a sample of 920 smart-meter customers in Ireland.When compared with the commonly employed time series forecasting methods for household load prediction, the proposed approach demonstrated better performance.In [9], an energy behavior-based support vector regression (SVR) model was created to predict household electricity consumption under various intervention strategies.The SVR model uses the Gaussian Radial Basis Function (RBF) as its core function.Results from the study demonstrate that the proposed model has the most superior and reliable performance when it comes to both next month's forecasting and time series forecasting.Reserachers in [10] have developed a hybrid LSTM (Long Short Term Memory) neural network method to predict individual household energy consumption.In their work, they showed that LSTM outperforms other advanced methods such as support vector machines (SVR) due to more efficient training.Zhang [4] propose power consumption prediction and anomaly detection using Transformer and k-means clustering and compare with LSTM models.This paper inspired us to develop our LSTM model with CNN support.In [11], researchers apply the CNN-LSTM model to predict residential energy consumption using deep learning methods.
Studies have shown near-perfect results, but no anomaly detection in the work.The approach used in paper [12] shows that DeepAnT is on par with other techniques and outperforms advanced anomaly detection techniques in most cases.Nguyen et.al.[13] developed prediction and anomaly detection using LSTM and LSTM autoencoder techniques.The method applied to supply chain management.Henning [14] describes how to capture data on manufacturing enterprises using the Internet of Things and also analyzes and visualizes power usage using a web-based dashboard.
LSTM is a model that has been tested very well when applied to sequential data and has been used to pair with many other deep-learning methods.The CNN-LSTM model has achieved significant accomplishments in optical character recognition, speech recognition, and natural language processing.Therefore, our proposal in this paper suggests leveraging the CNN-LSTM model for electric energy load estimation, followed by a comparison of the forecasted outcomes with actual data to detect anomalies.

III. METHODOLOGY
This study involves the development of an Energy Monitoring System (EMS) in a house, which serves as our electricity data logger.In the following section, we will provide a description of how we built our data acquisition module, the details of our dataset, and the method we employed for predictive analytics and anomaly detection using our dataset.

A. Data Acquisition Module
The primary processing unit for our modules is ESP32.We use the integrated power sensor to sample electricity data with a time sampling interval of 1 minute.The electricity that is supplied to electrical appliances is divided into three channels, each with its own current sensor.We store the collected data both on a memory card and in an online database.The internal components of our module are depicted in Figure 1, while Table 1 provides specifications for the module.
Our module is installed by attaching it to the wall of the house.Figure 2 shows how our module operates.

B. Data Description
The retrieved data from the electrical datalogger consist of measurement of electricity consumption and power quality covers the followings: Date Time, Voltage (V), Global Active Current (A), Power (W), frequency (Hz), Power Factor, Channel 1 Current (mA), Channel 2 Current (mA), Channel 3 Current (mA).Data used for this analysis consists of 181,440 minutes of data (2022-07-09 00.00 to 2022-11-11 23.59).We selected 5 hours of data, and performed hourly re-sampling using the mean as shown in Table 2.

C. Exploratory Data Analysis
As illustrated in Figure 3, prior to using and processing our data, we conduct an analysis to gain a deeper understanding.This involves performing Exploratory Data Analysis (EDA) to recognize patterns, uncover anomalies, evaluate hypotheses, and verify assumptions through the utilization of summary statistics and visual representations.Initially, we plot the data in each channel to obtain a comprehensive overview of the data.
The information presented in Figure 3 indicates that the voltage, current, and power data have recorded a value of 0 on several occasions, indicating multiple power outages in the electricity supply.Furthermore, the frequency of the data, which should not exceed 50, is recorded up to 100, and the power factor, which should not exceed 1, shows a value of more than 1.These observations can  be classified as outliers, which are data points that are significantly higher or lower than the nearest data point and other adjacent values in a dataset or graph.Outliers can be detected using statistical calculations based on a predefined threshold, such as Q3 + 1.5 × IQR, where Q3 represents the third quartile and IQR represents the interquartile range of the distribution.This approach is commonly used to detect upper outliers in non-parametric distributions [15].To facilitate the identification of outliers, a boxplot visualization can be utilized.The outliers can be seen clearly using the boxplot in Figure 4.It can be seen that there are extreme outliers in the parameters voltage, frequency, power factor, current in channel 2 and current in channel 3.So that these extreme outliers can be removed, while the rest are normal outliers and can be valuable data.
Additionally, in order to examine the correlation between the columns of variables in the dataset, we can use a heatmap to visualize the correlation.Figure 5 shows the data correlation heatmap, which indicates that there is a fairly strong relationship between the variables, suggesting that changes in variables can be linear.Therefore, it is recommended to use univariate analysis, which is simpler and lighter, and allows for multi-step forecasting.
Moreover, to identify any patterns in electricity usage data, we plotted the 24-hour electricity usage for ten days on Sundays in Figure 6.The figure shows that electricity usage does not exhibit a consistent pattern but occasionally has a similar pattern.This type of data is ideal for use with deep learning models because these models are better equipped to learn unexpected data compared to traditional machine learning models.

D. CNN-LSTM Architecture
The Convolutional Neural Network (CNN) belongs to the category of Deep Neural Networks and is widely employed in image analysis.It possesses the capability to identify and categorize specific elements within images.The CNN's architecture is primarily oriented towards swift training and the capacity to capture spatial information Within CNNs, individual neurons respond to stimuli within a limited section of the visual field, a concept known as the receptive field.In the shallower layers of CNNs, abstract features are extracted from the image, while in the deeper layers, these features are consolidated to gain a comprehensive understanding of the image, which proves beneficial for various image processing tasks [16].
The Long Short-Term Memory (LSTM) neural network represents a variation of the Recurrent Neural Network (RNN) but employs cells with more intricate internal structures rather than basic neurons (illustrated in Figure 7) [17].Similar to RNNs, LSTM treats inputs as interconnected time series data.However, the sophisticated internal architecture of LSTM cells addresses the challenges associated with gradient vanishing and explosion [18].The LSTM model comprises four crucial components: cell status, input gate, forget gate, and output gate (as depicted in Figure 7).The input, forget, and output gates oversee the processes of updating, retaining, and discarding information within the cell status [10].The forward computation process can be expressed as follows: To explain the calculation of the output value ht in a Long Short-Term Memory (LSTM) neural network, the current cell status value (C_t), last time frame cell status value (C_(t-1)), and the update for the current cell status value (C _t) are used.The forget gate (f_t), input gate (i_t), and output gate (o_t) control the update, maintenance, and deletion of information contained in the cell status.The output value ht is calculated using equations ( 4) and ( 6) based on the C _t and C_(t-1) values, with all weights including W_f, W_i, W_c, and W_o updated using the back-propagation through time (BPTT) algorithm [19].
The CNN-LSTM architecture combines CNN layers for feature extraction on input data with LSTMs to support sequence prediction.

E. Model Development
The CNN-LSTM model was constructed using the TensorFlow 2.0 Framework, which is a software library for machine learning and artificial intelligence that focuses on training and inference of deep neural networks.
In order to perform predictive analytics at an hourly level, the original minutely data was resampled to hourly data by taking the average, resulting in 3,024 hourly data points.
Since the univariate method was selected, only one variable-such as power, current, or electricity usage on each channel-needed to be chosen for prediction.To prepare the data for the CNN-LSTM model, a supervised dataset was created using the sliding window method.The sliding window segments consisted of 24 hours of past data, with the subsequent data point being defined as the label.Figure 8 demonstrates how the sliding window is Once the supervised data is obtained, it is split into two sets: an 80% training set and a 20% test set.Conv1D and MaxPooling1D are used for the CNN, which is then forwarded to the LSTM layer as depicted in Figure 9.The loss function selected is the mean-squared error (MSE) between the predicted and original data, with the Adam optimizer being used for the model.The training model is set to 50 epochs and a batch size of 100.

F. Anomaly Detection Method
The method we use for anomaly detection involves comparing real values with predicted values.If there is a significant difference between the two, it indicates that the real values deviate from the usual data trend and may not be normal.We calculate the error score between predicted and real values using MAE (Mean Absolute Error) as shown in Eq. 7 [20].We determine the threshold by collecting score values from multiple trials.If the score between predicted and test values exceeds the threshold, the test value is considered abnormal.To determine the threshold, we use higher outlier statistical calculations shown in Eq. 8 [15].

A. Model Performance
The CNN-LSTM model's training and test loss over 50 epochs are depicted in Figure 10.The figure demonstrates that the model swiftly converges without overfitting.This is attributed to the advantageous characteristics of LSTM that quickly learns to detect patterns in time series data, and CNN that deepens the analysis through its convolutional concept.
We compared the CNN-LSTM model with the LSTM model to evaluate the model's performance.LSTM is a unique variant of RNN that addresses the issue of gradient disappearance and explosion in lengthy sequence training by incorporating three different gating mechanisms.In other words, LSTM performs better than standard RNNs in longer sequences, making it suitable for time series forecasting tasks.
Table 3 displays the comparison of model performance.Since the CNN-LSTM model's task is forecasting, we use Mean Squared Error (MSE) as the metric.According to Table 3, the CNN-LSTM model outperforms LSTM by 29%.

B. Consumption Prediction
The comparison of the predicted results of the CNN-LSTM model and the LSTM model for the test set can be observed in Figure 11, which depicts the actual power consumption data for 200 hours in blue, along with the predicted values from the LSTM model in red and the CNN-LSTM model in green.The graph indicates that The anomaly threshold value is calculated automatically by our system using the upper outlier calculation method on the Mean Absolute Error (MAE) of the training data.We also apply the Min-Max scaler to the calculated MAE score.Below are the outcomes of the anomaly threshold calculation using the higher outliers formula: Based on the calculation, we have determined the anomaly threshold to be 0.12, which means that any score exceeding this value will be considered as an anomaly.Figure 12 demonstrates that there were several instances where the MAE score exceeded the threshold, and Figure 13 illustrates the anomalous electricity usage with red dots.
Upon analyzing the recorded data, we discovered a power outage that lasted for several hours and excessive usage of certain electrical equipment.This confirms that there was indeed an anomaly in the usage data.

V. CONCLUSION
In this paper, we propose a model that combines the Convolutional Neural Networks (CNN) -Long Short Term Memory (LSTM) approaches to predict electric energy consumption and identify anomalies in electric networks.The model takes segmented training data with hourly shifts and uses CNN as an autoencoder to handle inconsistent input patterns, resulting in better performance compared to using LSTM alone.Mean square error (MAE) is used to score the error between predicted and real data, and the proposed model achieves high accuracy in both prediction and anomaly detection.
Further research could explore ways to improve prediction accuracy, group usage data by time, assess anomaly detection methods, study seasonal differences in power consumption prediction and anomaly detection, and explore other methods such as Autoregressive Integrated Moving Average (ARIMA) and Support Vector Machines (SVM).

Figure 1 .
Figure 1.Inside part of data acquisition module

Figure 6 .Figure 7 .
Figure 6.24-hours electricity usage for 10 days in Sunday

Figure 10 .Figure 9 .Figure 8 .
Figure 10.Train and test loss over the 50 epochs of our model

Table 2 .
Sample display of the data set