In this article, you are going to learn about the data quality analysis performed by Detect. The article is divided into the following sections:
- What is Detect Data Quality?
- Discarded reasons
- Data quality warnings
- Data Quality in the UI
- Data Quality in dashboards
What is Detect Data Quality?
Detect is a product that analyses vast amounts of data, so bad data quality can become critical in generating good-quality results. Consequently, Detect evaluates each building with detail to maintain a minimum standard of data quality.
Suppose there is a misconfiguration, resulting in issues such as missing relevant data or inaccurate consumption data. In that case, the building will be discarded and will neither generate results in Detect nor be used for comparison with other buildings.
In other cases, the building will be accepted and generate results in Detect. However, acceptance does not guarantee perfect data quality. Therefore, a more detailed analysis is conducted to identify possible mild or severe warnings related to data quality.
The following sections will explain why a building might be discarded and outline the data quality categories analysed within the software.
Discarded reasons
When a building is discarded, it can be owing to reasons in two categories:
- Problems regarding the account configuration or location data
- Problems regarding the consumption data
Problems regarding the account configuration or location data
Data Quality group | Error | Description |
Reference meter | Without electrical reference device | The electrical reference device has not been configured for that location |
Coordinates | Without coordinates | The location is discarded because it is not possible to retrieve the location's coordinates |
Wrong coordinates | The location is discarded because the geolocation provider cannot get a valid address from the configured coordinates. | |
Meteo data | No weather data | The location is discarded if it is not possible to retrieve degree days and/or temperature data. |
Surface | Wrong surface | The configured surface was <10m2 and with the consumption data available, Detect was unable to estimate a new surface that would yield reasonable specific consumption results. |
Without surface | The configured surface was <10m2 and with the consumption data available Detect was unable to estimate a surface. | |
High specific consumption with missing months | Specific consumption is too high and there is missing data | |
Estimated surface (only if location surface = 1m2) | Low specific consumption without missing months | Even without missing data, the location is discarded because after imputing a surface the specific consumption is too low. |
High specific consumption without missing months | Even without missing data, the location is discarded because after imputing a surface the specific consumption is too high. | |
Low projected specific consumption | After imputing the surface for this location and filling the data gaps, the location is discarded because the specific consumption is too low. | |
High projected specific consumption | After imputing the surface for this location and filling the data gaps, the location is discarded because the specific consumption is too high. |
Problems regarding the consumption data
Category | Error | Description |
Data gaps | Without readings | The location is discarded if it doesn’t have any Active Energy readings. |
Relevant consumption data gaps, starting months' data missing | This error is raised if there is missing data in the starting months of the 12-month period. | |
Relevant consumption data gaps, middle months' data missing |
This error is raised if there is missing data in the middle months of the 12-month period. |
|
Relevant consumption data gaps, last months' data missing | This error is raised if there is missing data in the last months of the 12-month period. | |
Relevant consumption data gaps, some months' data missing | This error is raised if there is missing data distributed over the 12-month period. | |
Relevant consumption data gaps, no missing months' data | This error is raised if there is missing data but it does not add up to any full month. | |
Consumption value | Consumption lower than a defined threshold | This error is raised if the consumption of 12 months (considering the estimated lost consumption in gaps) is lower than 1500 kWh |
Consumption lower than a defined threshold, starting months' data missing | This error is raised if the consumption of 12 months is lower than 1500 kWh because there is data missing in the starting months of the 12-month period. | |
Consumption lower than a defined threshold, middle months' data missing | This error is raised if the consumption of 12 months is lower than 1500 kWh because there is data missing in the middle months of the 12-month period. | |
Consumption lower than a defined threshold, last months' data missing | This error is raised if the consumption of 12 months is lower than 1500 kWh because there is data missing in the last months of the 12-month period. | |
Consumption lower than a defined threshold, some months' data missing | This error is raised if the consumption of 12 months is lower than 1500 kWh because there is data missing in distributed months of the 12-month period. | |
Consumption lower than a defined threshold, no missing months' data | This error is raised if the consumption of 12 months is lower than 1500 kWh because there is data missing but there is not a specific month missing |
Data quality warnings
Among the Accepted locations, Detect looks for mild and severe data quality warnings. Those are analysed in the following categories:
- External data: Related to weather data and holidays.
- Metadata: Related to the account configuration.
- Surface: Related to valid values for the configured surface.
- Geolocation: Related to coordinates.
- Monthly data: Related to data gaps or extreme consumption values.
- Hourly data: Related to data gaps or extreme consumption values.
For each category, Detect analyses several parameters and then classifies all Accepted buildings into "without warnings", "with mild warnings" or "with severe warnings".
In the following section there is more detail on the available warnings:
- Warnings - External data
- Warnings - Metadata
- Warnings - Surface
- Warnings - Geolocation
- Warnings - Monthly data
- Warnings - Hourly data
Warnings - External data
Concept | Description | Warning type (mild/severe) |
Data from a single weather station | It becomes a warning if data comes from multiple weather stations | Mild |
Data from a nearby weather station | It becomes a warning if data comes from a weather station far from the building |
|
Warnings - Metadata
Concept | Description | Warning type (mild/severe) |
Configured prices | It becomes a warning if prices are not configured | Mild |
Correct configured activity | It becomes a warning if the activity is configured as 'Other' | Mild |
Correct configured sector | It becomes a warning if the sector is configured as 'Other' | Mild |
Correct configured reference temperatures | It becomes a warning if the reference temperature for degree days calculation is not configured | Mild |
Prices with currency | It becomes a warning if the currency for electricity prices is not available | Mild |
Warnings - Surface
Concept | Description | Warning type (mild/severe) |
Configured surface | It becomes a warning if the surface has not been configured | Mild |
Configured surface within range (not too low) | It becomes a warning if the surface is lower than expected | Mild |
Configured surface within range (not too high) | It becomes a warning if the surface is higher than expected | Mild |
Surface configured with valid value (greater than zero) | It becomes a warning if the surface is zero or negative | Mild |
Warnings - Geolocation
Concept | Description | Warning type (mild/severe) |
With postal code | It becomes a warning if there is no postal code | Mild |
Consistent postal code and coordinates | It becomes a warning if the postal code and the coordinates do not match | Mild |
Correct postal code | It becomes a warning if the configured postal code doesn’t match the expected format for its country. | Mild |
Consistent country and coordinates | It becomes a warning if the country and the coordinates do not match | Mild |
Warnings - Monthly data
Concept | Description | Warning type (mild/severe) |
Complete monthly consumption |
It becomes a warning if there are monthly consumption values missing |
|
Monthly consumption without starting gaps | It becomes a warning if monthly consumption values are missing at the beginning of the period |
|
Monthly consumption without middle gaps | It becomes a warning if monthly consumption values are missing in the middle of the period |
|
Monthly consumption without ending gaps | It becomes a warning if monthly consumption values are missing at the end of the period |
|
Monthly consumption with valid values (without negative values) | It becomes a warning if there are negative monthly consumption values | Severe |
Monthly consumption with valid values (without zeros) | It becomes a warning if monthly consumption values are 0 |
|
Monthly consumption within range (without too low values) | It becomes a warning if monthly consumption values are considered too low |
|
Monthly consumption within range (without too high values) | It becomes a warning if monthly consumption values are considered too high |
|
Warnings - Hourly data
Concept | Description | Warning type (mild/severe) |
Hourly consumption with valid values (without zeros) | It becomes a warning if hourly consumption values are 0 |
|
Hourly consumption without starting gaps | It becomes a warning if hourly consumption values are missing at the beginning of the period |
|
Hourly consumption without middle gaps | It becomes a warning if hourly consumption values are missing in the middle of the period |
|
Hourly consumption without ending gaps | It becomes a warning if hourly consumption values are missing at the end of the period |
|
Hourly consumption with valid values (no negative values) | It becomes a warning if there are negative hourly consumption values |
|
Hourly consumption within range (no values too high) | It becomes a warning if hourly consumption values are considered too high |
|
Hourly consumption within range (no values too low) | It becomes a warning if hourly consumption values are considered too low |
|
Hourly and monthly consumptions match | It becomes a warning if there is a mismatch between missing hourly and monthly consumption values |
|
Hourly consumption without repeated values | It becomes a warning if there are too many repeated hourly consumption values |
Severe |
Data Quality in the UI
In the portfolio view within Detect, there are three different sections related to Data Quality:
Portfolio status summary
The portfolio status summary view shows a summary of the data quality status for the account, through a pie chart and stacked bars.
The pie chart shows the number of locations that have been discarded in red; and from the ones that have been accepted, the distribution between those that do not have any warnings (in green, 0 in the screenshot below), those that have mild warnings (in yellow) and those that have severe warnings (in orange).
The stacked bars show the warnings distribution among every analysed category:
The number of discarded locations is the same in every category since they were discarded before the warnings analysis and therefore do not belong to any category. Every location appears once in every category.
Detailed portfolio status
The detailed portfolio status section links to a table that includes all locations in the account with all possible discard reasons or warnings analysis:
Errors and warnings description
This section shows the analysed concepts inside each category for accepted locations. As an example, the hourly consumption category:
Using as an example the first row, "Hourly consumption with valid values (without zeros)":
- 57 locations in this account have valid hourly consumption without zeros
- 4 have mild warnings (have some zeros in their consumption)
- 2 have severe warnings (many zeros inside their consumption).
Data Quality in dashboards
In order to check the widgets prepared for Detect data quality, please check this article.