Data Exploration

Electric Consumption Data Visualisation

The UC Irvine Machine Learning Repository is a popular repository for machine learning datasets. We look at the Individual household electric power consumption data set from this repository.

The data consists of measurements of electric power consumption in one household with a one-minute sampling rate over a period of almost 4 years. There are 2075259 measurements gathered between December 2006 and November 2010 (47 months) where different electrical quantities and some sub-metering values which are measurements on various appliances such as dishwasher, washing machine, AC, are available.

We want to explore working with the Date and Time class using strptime and manipulating the data using dplyr package to create some visualizations in R. You can find the entire code on github here.

Load packages and read the file.

packagesloaddataLet’s take a quick look at the data. strdata

Handle Date and Time

We want to first visualize the overall trend of different measurements such as Voltage or Sub_metering over the years. So we need to convert the Date and Time variables to POSIXct class which is a way for R to work with date and time, and make manipulating data easy.

datetime

Take a look at and compare the readings from the sub meters over the years.

submeter.png
g1

We see that AC and water heater amount for the most electric usage followed by Washing machine and Refrigerator and then Kitchen appliances.

Now we look at data by year and compare the power consumption by month.

daily2008
g4.png

We see that in 2008, February power usage appears to be the lowest and highest in May and October. If we further look at consumption by different appliances in the house using bar charts by month, below, we can confirm the same findings. It is not clear though what could a possible explanation for this. There are several latent factors which can impact the electric consumption such as the geographic location of the household, or personal reasons such as travel plans of the members.

bars
g44.png

Finally let’s look at sub meter measurements for two different days and see how they compare. First day of May in 2008 versus 2009, have quiet different power usage per appliance. We can see that in 2008, the power usage is very low. On the other hand, in 2009 the same day shows high activities in all areas indicating use of washing machine, kitchen appliances as well as AC.

diffDays.png
g5.png
Sonia Sharma

Sonia Sharma

Sonia Sharma is a mathematician and expert in data extraction, data cleaning, visualisation, analytics and prediction models, and developing data products.

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *

Loading...