Detect Data Quality

Follow

In this article, you are going to learn about the data quality analysis performed by Detect. The article is divided into the following sections:

What is Detect Data Quality?

Detect is a product that analyses vast amounts of data, so bad data quality can become critical in generating good-quality results. Consequently, Detect evaluates each building with detail to maintain a minimum standard of data quality. 

Suppose there is a misconfiguration, resulting in issues such as missing relevant data or inaccurate consumption data. In that case, the building will be discarded and will neither generate results in Detect nor be used for comparison with other buildings.

In other cases, the building will be accepted and generate results in Detect. However, acceptance does not guarantee perfect data quality. Therefore, a more detailed analysis is conducted to identify possible mild or severe warnings related to data quality.

The following sections will explain why a building might be discarded and outline the data quality categories analysed within the software.

 

Discarded reasons

When a building is discarded, it can be owing to reasons in two categories:

Problems regarding the account configuration or location data

Data Quality group Error Description
Reference meter Without electrical reference device The electrical reference device has not been configured for that location
Coordinates Without coordinates The location is discarded because it is not possible to retrieve the location's coordinates
Wrong coordinates The location is discarded because the geolocation provider cannot get a valid address from the configured coordinates.
Meteo data No weather data The location is discarded if it is not possible to retrieve degree days and/or temperature data.
Surface Wrong surface The configured surface was <10m2 and with the consumption data available, Detect was unable to estimate a new surface that would yield reasonable specific consumption results.
Without surface The configured surface was <10m2 and with the consumption data available Detect was unable to estimate a surface.

 

Problems regarding the consumption data

Category Error Description
Data gaps Without readings The location is discarded if it doesn’t have any Active Energy readings.
Relevant consumption data gaps, starting months' data missing This error is raised if there is missing data in the starting months of the 12-month period.
Relevant consumption data gaps, middle months' data missing

This error is raised if there is missing data in the middle months of the 12-month period.

Relevant consumption data gaps, last months' data missing This error is raised if there is missing data in the last months of the 12-month period.
Relevant consumption data gaps, some months' data missing This error is raised if there is missing data distributed over the 12-month period.
Relevant consumption data gaps, no missing months' data This error is raised if there is missing data but it does not add up to any full month.
Consumption value Consumption lower than a defined threshold This error is raised if the consumption of 12 months (considering the estimated lost consumption in gaps) is lower than 1500 kWh
Consumption lower than a defined threshold, starting months' data missing This error is raised if the consumption of 12 months is lower than 1500 kWh because there is data missing in the starting months of the 12-month period.
Consumption lower than a defined threshold, middle months' data missing This error is raised if the consumption of 12 months is lower than 1500 kWh because there is data missing in the middle months of the 12-month period.
Consumption lower than a defined threshold, last months' data missing This error is raised if the consumption of 12 months is lower than 1500 kWh because there is data missing in the last months of the 12-month period.
Consumption lower than a defined threshold, some months' data missing This error is raised if the consumption of 12 months is lower than 1500 kWh because there is data missing in distributed months of the 12-month period.
Consumption lower than a defined threshold, no missing months' data This error is raised if the consumption of 12 months is lower than 1500 kWh because there is data missing but there is not a specific month missing

 

Errors and warnings

Among the Accepted locations, Detect looks for mild and severe data quality warnings. Those are analysed in the following categories:

  • External data: Related to weather data and holidays.
  • Metadata: Related to the account configuration.
  • Surface: Related to valid values for the configured surface.
  • Monthly data: Related to data gaps or extreme consumption values.
  • Hourly data: Related to data gaps or extreme consumption values.
  • Geolocation: Related to coordinates.

For each category, Detect analyses several parameters and then classifies all Accepted buildings into "without warnings", "with mild warnings" or "with severe warnings".

 

Data Quality in the UI

In the portfolio view within Detect, there are three different sections related to Data Quality:

 

detect-data-quality-1.png

Portfolio status summary

The portfolio status summary view shows a summary of the data quality status for the account, through a pie chart and stacked bars.

The pie chart shows the number of locations that have been discarded in red; and from the ones that have been accepted, the distribution between those that do not have any warnings (in green, 0 in the screenshot below), those that have mild warnings (in yellow) and those that have severe warnings (in orange).

The stacked bars show the warnings distribution among every analysed category:

detect-data-quality-2.png

The number of discarded locations is the same in every category since they were discarded before the warnings analysis and therefore do not belong to any category. Every location appears once in every category.

 

Detailed portfolio status

The detailed portfolio status section links to a table that includes all locations in the account with all possible discard reasons or warnings analysis:

detect-data-quality-3.png

 

Errors and warnings description

This section shows the analysed concepts inside each category for accepted locations. As an example, the hourly consumption category:

detect-data-quality-4.png

Using as an example the first row, "Hourly consumption with valid values (without zeros)":

  • 57 locations in this account have valid hourly consumption without zeros
  • 4 have mild warnings (have some zeros in their consumption)
  • 2 have severe warnings (many zeros inside their consumption).

 

Data Quality in dashboards

In order to check the widgets prepared for Detect data quality, please check this article

Was this article helpful?