What Are Big Data Analytics in Transportation?

Transportation data analytics increasingly power mobility information and insights transforming transportation planning and operations by making it easier, faster, cheaper, and safer to collect and understand critical information. 

In recent years, the transportation industry has been disrupted by multiple forces, including the COVID-19 pandemic, an ongoing road safety crisis, and a growing push for decarbonization. Meanwhile, many public agencies are facing tight budgets and a looming “fiscal cliff,” adding pressure to do more with less. As these and other changes unfold, transportation experts must: 

  • Prioritize projects accurately to guide effective resource investment and make the biggest impact.
  • Make informed decisions based on recent, accurate data, not on guesses or input from a few vocal stakeholders.
  • Enact social equity and environmental justice, providing access and support for outlying areas and the underserved.
  • Manage ongoing roadway changes (e.g. construction) and special events to keep traffic flowing safely and efficiently.
  • Foster public engagement so that residents, constituents, and public officials can understand and support planned mobility efforts.
  • Accurately and quickly measure results of transportation initiatives, enabling timely adjustment and optimization.

Facing these demands, more and more cities, transit organizations, departments of transportation (DOTs), and other localities are using transportation data analytics to solve problems, prioritize investments, and win stakeholder support. But how do these analytics work, and how can you choose a reliable transportation data analytics source?

Transportation Data Analytics Capture the Speed of Change

Happily, we are no longer limited to sensors and surveys. Transportation data analytics can provide complete end-to-end trip information, including trip origins and destinations, routes, trip distances, travel time, and even real-time data on how vehicles are moving. When data is aggregated from multiple sources, transportation data analytics become even more valuable, providing transportation experts with details including home and work locations, trip purpose, aggregated traveler demographics, and more.

Using transportation data analytics, transportation professionals can quickly access accurate data for every road in the country, every day of the year — even in real time. 

Not all transportation data is created equal. When collecting and using data, it is important to understand the factors at play. These factors drive transportation data coverage, depth, and accuracy: 

Data Sets and Sources

The landscape of mobility data sources is constantly changing. Taking advantage of emerging data sources while following data privacy regulations and best practices are musts for any transportation data provider.

Among these emerging data sources is Connected Vehicle data, which, along with GPS data, is paired with contextual data points from road network data, census data, and physical counters to offer a full picture of how people move.

Once filtered through a set of complex machine-learning-based algorithms, transportation data analytics can be used to analyze trips from the moment journeys begin to the moment they end, via any mode, on all roads and paths. They can even be used to monitor roadway conditions in real time. But not all transportation data is created equal, and not all data leads to powerful insights.

StreetLight’s data sources have grown and adapted over time to take advantage of industry innovations while protecting privacy.

Key questions for evaluating data sets and providers include:

  1. How big is the sample size? The larger the sample size, the lower the margin for error in the data.
  2. How many data sources? The most accurate and unbiased data sets draw from multiple sources. 
  3. How frequently is the data updated? Regular updates allow for more granularity in studies.
  4. What does the data cover? Ideally it should be able to drill down to rural areas, small streets, and individual intersections. And it should capture historical travel data as well as recent data.
  5. What modes does the data include? Can it identify cyclists, pedestrians, transportation network company driving, transit, and more
  6. Does the data include trip characteristics? Look for datasets that include speed information, trip purpose, trip length, and more.
  7. Does the data offer date-specific measurement? Specifying dates allows the data to measure movements during historical events and create before-and-after studies.
  8. Does the data offer real-time insights? When data is needed to monitor ongoing conditions and enable quick responses to emerging traffic issues (such as during construction or special events operations), look for a data platform that offers real-time or near real-time data.
  9. How is the data accessed? Look for an on-demand platform with access to run multiple studies, rather than a one-time download of a single analysis.
  10. Is the data specifically designed to support your use case? Look for a platform that aggregates and visualizes data outputs in a way that is purpose-built for your needs. For example, can it highlight dangerous vehicle speeds when analyzing safety, or visualize atypical delay when monitoring the impact of road construction, etc?

Algorithms and Machine Learning

Transportation data analytics rely on computer algorithms and, sometimes, on machine learning. Software engineering and data science expertise is increasingly important for understanding and evaluating transportation data sources. 

At a minimum, transportation data providers should be able to explain the modeling behind a transportation data algorithm, including the data sources, how the data is handled, and the algorithm’s capabilities. Transparency is critical for evaluating today’s complex data sets. 

Machine learning is an increasingly important element of transportation data analytics, and one that does not offer as much clarity as a computer algorithm. With machine learning, data scientists essentially “feed” a computer program actual data, and the computer “learns” to recognize and extract only that type of data and select it from a data set. Over time, the machine’s accuracy grows, although transparency decreases into what details the machine identifies, and how it evaluates them.


StreetLight’s propriety processing algorithm, Route Science® , transforms data from multiple sources into the metrics transportation professionals use every day.

Data scientists may not explain in detail how this process works (in order to protect intellectual property), but they should be able to share the degree of accuracy with which the process works. Sometimes the data will be more directional than definitive which can still be helpful, as long as users understand where the gaps are.

Overall, a company with an effective data set and process should have multiple proven uses with actual customers for their metrics, and not simply theoretical applications. 

Privacy Protection

Stakeholders sometimes have very real concerns about this level of transportation data analytics how data is collected, protected, and shared. Fortunately, best practices are emerging to ensure privacy protection. At StreetLight, we operate at or above established guidelines to set the tone for the industry. 

Data should never enable the tracking of individuals, or sending marketing messages targeted to individual devices (such as cellphones). Instead, analytics should describe patterns in the movement of composite groups of people. 

Transportation data analytics companies should not receive, process, or use personally identifiable information in the creation of customer products. Throughout the product-creation process, they should employ multi-step, multi-layered technical safeguards, including automated privacy and coverage checks that ensure sufficient aggregation based on dimensions such as time, space, and land use.

Data storage and processing should take place in a secure data repository protected by multi-layered network security architecture, and supported by system audits and controls. An additional step is to build in administrative safeguards and employee training.

Currently, the General Data Protection Regulation (GDPR) created by the European Union is stricter than U.S. privacy law, and therefore many data companies choose to follow the GDPR. Some also follow “Privacy by Design” practices, building privacy practices into their technical infrastructure and business operations.

Validating Data 

Validation is a critical step that confirms a transportation data set’s accuracy. Transportation data should be validated against an existing data set that has been confirmed for accuracy. In most cases, this is data from road sensors or counters.

StreetLight rigorously validates metrics against external sources, including physical counters, household surveys, and the Census.

Validated travel model results can also be used to confirm the accuracy of transportation data analytics, as can household travel surveys and U.S. Census data. Multiple validations can also be used to support the accuracy of a single analysis. 

Overall, look for data that is: 

  1. Anonymized
  2. Privacy-protected
  3. Carefully stored and managed
  4. Validated
  5. Proven successful in real-world applications

A powerful data set isn’t a magic solution to every question or problem that planners and managers face, but it is a flexible multi-tool in the transportation toolbox. It can facilitate quick testing of a hypothesis; support and feed existing data sources like modeling and sensors; provide facts to inform public discussion or opinion; make a compelling case grant funding; and more. 

Adding Transportation Analytics to Traditional Methods 

In gathering and analyzing mobility data, traditional methods have always had certain limitations. While physical traffic counter sensors and surveys aren’t going away anytime soon, transportation analytics are increasingly used to help fill gaps in traffic counter data as well as add richness to transportation planning and modeling. 


The traditional way to gather traffic volume data is to send staff onto a handful of targeted roadways either to manually count vehicles, or to install temporary or permanent “tube” sensors across the roadway to capture counts for the vehicles that drive over it.

Transportation experts are well-acquainted with the limitations of sensor-collected data, which include: 

  • Lower-trafficked and rural roads are often overlooked, which can skew the data for city-wide, regional, or national analyses
  • Sending staff onto busy roadways is dangerous to workers and distracts drivers.
  • Small sample sizes can skew modeled results.
  • Temporary traffic counters can drive inaccurate results.
  • Permanent traffic counters are expensive to install and maintain.


Traffic studies often include survey data, asking respondents questions about their travel routes and habits. But surveys may fall short in gathering sufficient data:

  • Surveys can be expensive, costing hundreds of dollars per household.
  • Results are based on small sample sizes (often around 1% or less) and small sample periods (usually 1-5 days).
  • Participants are increasingly difficult to recruit due to increased privacy concerns, and fewer households using landline phones.
  • Hard-to-reach populations are systematically under-sampled.
  • Individuals/households tend to underreport travel, especially for short trips, active transportation modes, and non-work purposes.
  • Error can be introduced via the weighting and expansion process.

On-demand transportation analytics can solve sample size and under-reporting challenges common to transportation surveys.

Overall, surveys are more powerful tools for gathering subjective, rather than objective, data. 


Data obtained from sensors and surveys have long provided transportation professionals with the necessary inputs for data modeling. Modelers assist planners by developing quantitative analyses that can create short- and long-term travel demand forecasts. 

Historically, data to develop and validate models has been limited by availability, frequency, or acquisition costs and time. These limitations are compounded by the level of detail that sophisticated models require.  Desktop analytics make acquiring the necessary data much easier.

  • Transportation data analytics offer an up-to-date and easy-to-use data source for improving, calibrating, and validating models.
  • Transportation data analytics offer highly granular datasets suitable for complex modeling, including route information, trip speed, length, and duration, travel mode (e.g. driving vs. cycling), O-D patterns, and more.
  • Transportation data analytics can serve as building blocks to develop simplified models on limited resources.
  • Agencies can use historical transportation data to model before-and-after scenarios to evaluate a project’s success. 

In the past, we’ve accepted the limitations of traditional methods because, before transportation data analytics, these methods were the best we had. 

About StreetLight Data

StreetLight Data pioneered the use of transportation data analytics to help transportation professionals solve their biggest problems. Since 2011, we have harnessed hundreds of data sources that contribute to our RouteScience® engine, developing unmatched transportation data processing capabilities and a deep, empirical understanding of how North America’s roads, sidewalks and transit interact. Access our self-serve web platform, StreetLight InSight® to analyze and visualize travel patterns 24/7 – in your neighborhood, the next town over or across the country.

Talk to Us

Talk to us about Transportation Data Analytics