“Data aggregation is any process in which information is gathered and expressed in a summary form, for purposes such as statistical analysis. A common aggregation purpose is to get more information about particular groups based on specific variables such as age, profession, or income.”
Data aggregation at Onboard is one of our key competencies. We aggregate data from many different sources such as the US Census, County tax assessor's offices, NCES, FBI, NOAA, to name just a few. Most often, we aggregate data from one or more time periods (months, years) to a set of geographic areas (ZIP codes, neighborhoods or counties). We also perform data aggregation processes within and between each of these datasets.
Levels of Data Aggregation
When combining data into information the level of aggregation will determine the message it conveys and its ability to be extrapolated. This runs the spectrum from a factoid to individual transactions.
A fact is the combination data points to convey a single, specific message. While it provides the most context for that subject its ‘ingredients’ can not be extrapolated.
An example of a fact Onboard produces is the cost of living difference between New York City and San Francisco:
In this example we aggregate the consumer expenditure across multiple categories of goods and services and then compare two different geographies. One geography represents a neighborhood boundary and the second represents a combination of zip code boundaries.
Benefit: This provides a direct answer to a specific question such as "What change in monthly expenses might I expect if I move from NYC to San Francisco?”
A series is a single type of information that is represented along a sequence. This sequence is often time and therefore called a Time Series. Size or volume is another common sequence.
An example of a series Onboard aggregates is Fair Market Rent:
In this example we return the monthly Fair Market Rent as defined by the HUD in Brooklyn, NY showing differences along a size sequence during the same time period.
Benefit: This provides a comparison of several similar data points and highlights the relationship between size and cost of a rental in the same geography and time period. “How much or little does rent increase with more bedrooms in that neighborhood?”
A multiseries is similar to a series however it combines two or more series that are compared along an independent axis.
An example of a multiseries Onboard aggregates is School Test Scores:
Benefit: This provides a deeper level of context on individual school performance within its school district in specific fields of study. “How strong is that school’s math and science program compared to english?”
An example of a summable multiseries Onboard aggregates is Local Crime Data:
In this example we display the risk of crime in Naperville, Illinois. These multiseries are then summed into parent groups of Personal, Property and total as compared to the National Average.
Benefit: This provides summations of individual types of crime. “Is personal or property crime more likely and how does that compare to the national average?”
Summary data represents the combination of different data across multiple axes. Users are most familiar working with this data in a table or excel.
An example of summary records Onboard aggregates are Property Sales Trends Data:
In this example we display the Average, Median, and Number of Sales in Greenwich, Connecticut over time.
Benefit: This calculates individuals transaction records for the user over time periods. “What was the average home sales price each quarter last year?”
Individual transactions display information about a specific event.
An example of individual transactions Onboard provides are Property Sales Transactions Data:
In this example we display the recent home sales transaction near 2325 Lincoln Ave, Miami FL 33133.
Benefit: This enables the user to see each record in its complete granularity. They can draw their own conclusions on how to interpret it. “What type of recently sold homes were sold near this address?”
Understanding the Data Aggregation Spectrum
Understanding the benefits and outcomes of the different levels of aggregation helps developers create an appropriate user experience. The Onboard local data platform produces a variety of data points with varied capabilities and metrics. In the coming weeks, we’ll explore each level of aggregation in depth so you can better understand how and when each data type is appropriate for your project.