Detect Data Quality

Follow

In this article, you are going to learn about the data quality analysis performed by Detect. The article is divided into the following sections:

What is Detect Data Quality?

Detect is a product that analyses vast amounts of data, so bad data quality can become critical in generating good-quality results. Consequently, Detect evaluates each building with detail to maintain a minimum standard of data quality. 

Suppose there is a misconfiguration, resulting in issues such as missing relevant data or inaccurate consumption data. In that case, the building will be discarded and will neither generate results in Detect nor be used for comparison with other buildings.

In other cases, the building will be accepted and generate results in Detect. However, acceptance does not guarantee perfect data quality. Therefore, a more detailed analysis is conducted to identify possible mild or severe warnings related to data quality.

The following sections will explain why a building might be discarded and outline the data quality categories analysed within the software.

 

Discarded reasons

When a building is discarded, it can be owing to reasons in two categories:

Problems regarding the account configuration or location data

Data Quality group Error Description
Reference meter Without electrical reference device The electrical reference device has not been configured for that location
Coordinates Without coordinates The location is discarded because it is not possible to retrieve the location's coordinates
Wrong coordinates The location is discarded because the geolocation provider cannot get a valid address from the configured coordinates.
Meteo data No weather data The location is discarded if it is not possible to retrieve degree days and/or temperature data.
Surface Wrong surface The configured surface was <10m2 and with the consumption data available, Detect was unable to estimate a new surface that would yield reasonable specific consumption results.
Without surface The configured surface was <10m2 and with the consumption data available Detect was unable to estimate a surface.
High specific consumption with missing months Specific consumption is too high and there is missing data
Estimated surface (only if location surface = 1m2) Low specific consumption without missing months Even without missing data, the location is discarded because after imputing a surface the specific consumption is too low.
High specific consumption without missing months Even without missing data, the location is discarded because after imputing a surface the specific consumption is too high.
Low projected specific consumption After imputing the surface for this location and filling the data gaps, the location is discarded because the specific consumption is too low.
High projected specific consumption After imputing the surface for this location and filling the data gaps, the location is discarded because the specific consumption is too high.

 

Problems regarding the consumption data

Category Error Description
Data gaps Without readings The location is discarded if it doesn’t have any Active Energy readings.
Relevant consumption data gaps, starting months' data missing This error is raised if there is missing data in the starting months of the 12-month period.
Relevant consumption data gaps, middle months' data missing

This error is raised if there is missing data in the middle months of the 12-month period.

Relevant consumption data gaps, last months' data missing This error is raised if there is missing data in the last months of the 12-month period.
Relevant consumption data gaps, some months' data missing This error is raised if there is missing data distributed over the 12-month period.
Relevant consumption data gaps, no missing months' data This error is raised if there is missing data but it does not add up to any full month.
Consumption value Consumption lower than a defined threshold This error is raised if the consumption of 12 months (considering the estimated lost consumption in gaps) is lower than 1500 kWh
Consumption lower than a defined threshold, starting months' data missing This error is raised if the consumption of 12 months is lower than 1500 kWh because there is data missing in the starting months of the 12-month period.
Consumption lower than a defined threshold, middle months' data missing This error is raised if the consumption of 12 months is lower than 1500 kWh because there is data missing in the middle months of the 12-month period.
Consumption lower than a defined threshold, last months' data missing This error is raised if the consumption of 12 months is lower than 1500 kWh because there is data missing in the last months of the 12-month period.
Consumption lower than a defined threshold, some months' data missing This error is raised if the consumption of 12 months is lower than 1500 kWh because there is data missing in distributed months of the 12-month period.
Consumption lower than a defined threshold, no missing months' data This error is raised if the consumption of 12 months is lower than 1500 kWh because there is data missing but there is not a specific month missing

 

Data quality warnings

Among the Accepted locations, Detect looks for mild and severe data quality warnings. Those are analysed in the following categories:

  • External data: Related to weather data and holidays.
  • Metadata: Related to the account configuration.
  • Surface: Related to valid values for the configured surface.
  • Geolocation: Related to coordinates.
  • Monthly data: Related to data gaps or extreme consumption values.
  • Hourly data: Related to data gaps or extreme consumption values.

For each category, Detect analyses several parameters and then classifies all Accepted buildings into "without warnings", "with mild warnings" or "with severe warnings".

In the following section there is more detail on the available warnings:

Warnings - External data

Concept Description Warning type (mild/severe)
Data from a single weather station It becomes a warning if data comes from multiple weather stations Mild
Data from a nearby weather station It becomes a warning if data comes from a weather station far from the building
  • Mild if it is 25km to 50km far
  • Severe if it is further than 50km

 

Warnings - Metadata

Concept Description Warning type (mild/severe)
Configured prices It becomes a warning if prices are not configured Mild
Correct configured activity It becomes a warning if the activity is configured as 'Other' Mild
Correct configured sector It becomes a warning if the sector is configured as 'Other' Mild
Correct configured reference temperatures It becomes a warning if the reference temperature for degree days calculation is not configured Mild
Prices with currency It becomes a warning if the currency for electricity prices is not available Mild

 

Warnings - Surface

Concept Description Warning type (mild/severe)
Configured surface It becomes a warning if the surface has not been configured Mild
Configured surface within range (not too low) It becomes a warning if the surface is lower than expected Mild
Configured surface within range (not too high) It becomes a warning if the surface is higher than expected Mild
Surface configured with valid value (greater than zero) It becomes a warning if the surface is zero or negative Mild

 

Warnings - Geolocation

Concept Description Warning type (mild/severe)
With postal code It becomes a warning if there is no postal code Mild
Consistent postal code and coordinates It becomes a warning if the postal code and the coordinates do not match Mild
Correct postal code It becomes a warning if the configured postal code doesn’t match the expected format for its country. Mild
Consistent country and coordinates It becomes a warning if the country and the coordinates do not match Mild

 

Warnings - Monthly data

Concept Description Warning type (mild/severe)
Complete monthly consumption

It becomes a warning if there are monthly consumption values missing
  • Mild if there is any missing value
  • Severe if more than 50% of values are missing
Monthly consumption without starting gaps It becomes a warning if monthly consumption values are missing at the beginning of the period
  • Mild if there is any missing value
  • Severe if more than 3 months of values are missing
Monthly consumption without middle gaps It becomes a warning if monthly consumption values are missing in the middle of the period
  • Mild if there is any missing value
  • Severe if more than 3 months of values are missing
Monthly consumption without ending gaps It becomes a warning if monthly consumption values are missing at the end of the period
  • Mild if there is any missing value
  • Severe if more than 3 months of values are missing
Monthly consumption with valid values (without negative values) It becomes a warning if there are negative monthly consumption values Severe
Monthly consumption with valid values (without zeros) It becomes a warning if monthly consumption values are 0
  • Mild if there is any 0 value
  • Severe if more than 1 month of consumption is 0
Monthly consumption within range (without too low values) It becomes a warning if monthly consumption values are considered too low
  • Mild if there are any unexpected low values
  • Severe if more than 3 values are unexpectedly low
Monthly consumption within range (without too high values) It becomes a warning if monthly consumption values are considered too high
  • Mild if there are any unexpected high values
  • Severe if more than 3 values are unexpectedly high

 

Warnings - Hourly data

Concept Description Warning type (mild/severe)
Hourly consumption with valid values (without zeros) It becomes a warning if hourly consumption values are 0
  • Mild if there is any 0 value
  • Severe if more than 60 days have hourly 0 values
Hourly consumption without starting gaps It becomes a warning if hourly consumption values are missing at the beginning of the period
  • Mild if there is any value missing
  • Severe if more than 60 days of hourly values are missing
Hourly consumption without middle gaps It becomes a warning if hourly consumption values are missing in the middle of the period
  • Mild if there is any value missing
  • Severe if more than 60 days of hourly values are missing
Hourly consumption without ending gaps It becomes a warning if hourly consumption values are missing at the end of the period
  • Mild if there is any value missing
  • Severe if more than 60 days of hourly values are missing
Hourly consumption with valid values (no negative values) It becomes a warning if there are negative hourly consumption values
  • Mild if there is any negative value
  • Severe if more than 5% of hourly values in the year are negative
Hourly consumption within range (no values too high) It becomes a warning if hourly consumption values are considered too high
  • Mild if there are any unexpected high values
  • Severe if more than 5% of hourly values in the year are unexpectedly high
Hourly consumption within range (no values too low) It becomes a warning if hourly consumption values are considered too low
  • Mild if there are any unexpected low values
  • Severe if more than 5% of hourly values in the year are unexpectedly low
Hourly and monthly consumptions match It becomes a warning if there is a mismatch between missing hourly and monthly consumption values
  • Mild if the difference is more than 1%
  • Severe if the difference is more than 50%
Hourly consumption without repeated values It becomes a warning if there are too many repeated hourly consumption values

Severe

 

Data Quality in the UI

In the portfolio view within Detect, there are three different sections related to Data Quality:

 

detect-data-quality-1.png

Portfolio status summary

The portfolio status summary view shows a summary of the data quality status for the account, through a pie chart and stacked bars.

The pie chart shows the number of locations that have been discarded in red; and from the ones that have been accepted, the distribution between those that do not have any warnings (in green, 0 in the screenshot below), those that have mild warnings (in yellow) and those that have severe warnings (in orange).

The stacked bars show the warnings distribution among every analysed category:

detect-data-quality-2.png

The number of discarded locations is the same in every category since they were discarded before the warnings analysis and therefore do not belong to any category. Every location appears once in every category.

 

Detailed portfolio status

The detailed portfolio status section links to a table that includes all locations in the account with all possible discard reasons or warnings analysis:

detect-data-quality-3.png

 

Errors and warnings description

This section shows the analysed concepts inside each category for accepted locations. As an example, the hourly consumption category:

detect-data-quality-4.png

Using as an example the first row, "Hourly consumption with valid values (without zeros)":

  • 57 locations in this account have valid hourly consumption without zeros
  • 4 have mild warnings (have some zeros in their consumption)
  • 2 have severe warnings (many zeros inside their consumption).

 

Data Quality in dashboards

In order to check the widgets prepared for Detect data quality, please check this article

Was this article helpful?