You think that bus is crowded? (the Bias series)


City life is great.  No need for a car. Especially here in NYC where there's Citibike, buses, ferries, taxis, and the country's largest subway system. 4.3 million New Yorkers ride the subway every day, so another Monday morning squeeze onto the downtown 4 train to the office is just par for the course. But is it? Or does something seem different? Perhaps the subway's more crowded than just a few years ago? Well yes, it is.

According to the New York MTA, ridership is up again, with a 3%+ year over year increase.  This is just one of the many statistics found in a recent New York Times article "Use of Public Transit in U.S. Reaches Highest Level Since 1956, Advocates Report".  And who could argue with that?  The numbers don't lie.  Nationally, public transit ridership has increased at a faster rate than any other mode of transportation.  According to the American Public Transportation Association, national ridership has increased 37% since 1995, outpacing the growth in both population (20%) and miles driven (23%).

Dueling banjos 

So here's a startling stat: Since 1990 the percentage of people using public transportation to commute to work has decreased from 5.91% to 5.47%. That's according to the Census Bureau's American Community Survey and decennial data.

How is this possible? The Census Bureau and the APTA playing different tunes?  Ridership up but fewer people commuting? Well data bias is the likely cause. From both sources.

But wait. Data is data. Stats. Facts. How can data be biased and what does that even mean? Well, here's a definition I like: "Data bias is the systematic variation of data from reality." Or, put another way, every statistic is influenced by perspective and you really need to think about the source, the message, the viewpoint, and the methods behind the collection, interpretation, and presentation.


data bias

Paint by Numbers?

How does a sources intent bias their data?

  • Perspective: The article's title is the first clue: "...Advocates Report". Going in, the reader's expectations should be set.
  • Definitions: The APTA is focused on showing the increase in ridership. Ridership simply = total number of trips. That's it.
  • Different Focus: The Census / ACS is focused on how people get to work. But not all trips are trips to work.

Also, it's interesting that the ACS data shows commuting by mass transit isn't growing any faster than other modes of transportation.

Question the Artist

Some considerations and questions to help find the tune.

  • The transit numbers are isolated to transit "trips". What types of trips account for the growth?
  • How has the volume of other types of trips changed (cars, bikes, walks)?
  • When compared to miles driven, are they considering miles / person? How is the increase in carpooling reflected?
  • Has some other basis in the data...perhaps the availability of transit options...been excluded from the presentation?

The comparison to miles driven is particularly misleading - a better comparison would be to driving trips. Or even better, compare the change in total transit miles to miles driven. Or compare transit miles / passenger to driving miles / passenger.

Transit makes up only a tiny percent of commuting trips nationally. Source: Onboard Informatics

On the Census side...guess what?  More bias.

  • Collection: The data is survey collected and sampled - what people say they do isn't always the same as what they actually do.
  • Narrow focus: Only commuting trips are included - perhaps people use mass transit for errands and leisure activities.

Clarity affords focus!

So what can you do? Don't take the "facts" at face value.  Wipe away the fog on the window and be sure you're clear about what is behind the data.  Try to decipher the bias in the data and the presentation. Someone is nearly always trying to make a point. And there's nothing wrong with that. Without the point and a voice there is no story.  And without the story...

...hey its just data.

Image Credit: Mahin Fayaz on