The Seven Things to Look for When Sourcing Quality Data


Onboard Informatics has been aggregating data for 15 years. We strive to provide quality data for our clients that it easy to use and effective, but what does that really mean? If you want to make sure the data you use is quality data, there are seven key things to look for. In this blog, I’ll explore each element, why it’s important, and the potential ramifications if it’s off. There are many ways to examine the quality of data, but these are the main components:

Is the Data Complete? When we talk about completeness in the context of data, we are really looking about the final user experience. In other words, will the data meet or exceed user expectations?

If you are building a real estate tool that provides home sellers a recap of the last sales price of all the properties within a specific area, you want to make sure the dataset will account for all the home sales. If there are significant gaps, whether in time or area, the end user will notice and you will instantly lose credibility.

When you are looking for a data source, find out how often the data is updated, the source of the data and the overall coverage to make sure it is complete.

Is the Data Valid? The validity of the data is related to the references of that data, which is important when multiple datasets are being used together. Onboard aggregates a variety of datasets and has an entire data team that works on data validity.

If you are building a property page for example, and want to include local school information, you need to make sure the school information is valid for that property. Validity goes beyond geolocation. You want those schools to be in the assigned school district of that property. That way, the end user will see that data as valid and not just accurate (next).

Is the Data Accurate? This one is fairly straightforward, but incredibly important. You want the data you present to reflect reality. Accurate data is different than complete data. You can have complete data on every retail establishment in a specific zip code, but the names of the establishments might have misspellings. The data would be complete, but not accurate.

If you are building a neighborhood landing page for example, and want to include the top restaurants, grocery stores and retail shops, you want to make sure that data is accurate. The establishments should exist in the way that’s represented in your tool or platform.

Is the Data Consistent? If you are pulling in multiple datasets, you will need to measure the consistency of the data. If the datasets are not consistent, they won’t create a seamless user experience.

For example, if you are building an email property alert and are including information on homes that have sold in a specific area in the last month, you will need to make sure several datasets work together.

The boundary data needs to be consistent with the property data so when you create the parameters for the email, the data provided is consistent. In other words, you’re relaying the properties that sold in the recipient’s neighborhood, and not all the properties that sold in their state.

Is the Data Available? When you are building an application or solution, you want to make sure the data you are using is readily available and reliable. You don’t want to start on a project with a certain dataset and find that well dry up while you are still in development.

When shopping for data, look to the source of the data. Research their clientele. Make sure they are stable and your data will be too.

Is the Data Timely? All data should be time stamped, but beyond that, you want to understand how often each dataset is updated. Some data needs to be updated daily and some can be updated annually. Again, it’s about consumer perception. If a consumer will interact with your product and find the data out-of-date, you’re going to have an issue with credibility.

If you are developing a listing page, the property data should be updated daily (if not several times a day). The timeliness of that dataset is paramount. Other datasets, like demographics, don’t change as often and can probably be updated less frequently.

Onboard timestamps all data and posts our update frequency on all documentation. Any data vendor you source should do the same.

If you are interested in finding out more about Onboard Informatics and the data we provide, fill out the form below.