What Are Big Data Analytics in Transportation?

Transportation data analytics increasingly power mobility information and insights transforming transportation planning by making it easier, faster, cheaper, and safer to collect and understand critical information. 

While the transportation industry may not be in crisis, it is certainly being heavily disrupted by multiple forces, including the COVID-19 pandemic. As these changes unfold, transportation experts must: 

  • Prioritize projects accurately to guide effective resource investment and make the biggest impact.
  • Make informed decisions based on recent, accurate data, not on guesses or input from a few vocal stakeholders.
  • Maintain social equity and environmental justice, providing access and support for outlying areas and the underserved. 
  • Foster public engagement, so that residents, constituents, and public officials understand, can respond to questions about, and support planned mobility efforts.
  • Accurately and quickly measure results of transportation initiatives, enabling adjustment and optimization in real time.

Increasing numbers of cities, transit organizations, departments of transportation, and other localities are using transportation data analytics to solve problems, prioritize investments, and win stakeholder support. But what are they, and how can you choose a reliable transportation data analytics source?

Transportation Data Analytics Capture the Speed of Change

Happily, we are no longer limited to sensors and surveys. Transportation data analytics can provide complete end-to-end trip information, including trip origins and destinations, routes, trip distances, and travel time. When data is aggregated from multiple sources, transportation data analytics become even more valuable, providing transportation experts with details including home and work locations, trip purpose, traveler demographics, and more.

Using transportation data analytics, transportation professionals can quickly access accurate data for every road in the country, every day of the year. 

Not all transportation data is created equal. When collecting and using data, it is important to understand the factors at play. These factors drive transportation data coverage, depth, and accuracy: 

Data Sets and Sources

Commonly, transportation data analytics rely on information collected from navigation GPS systems in cars and trucks, and from applications installed on mobile devices Location-Based Services (LBS).

Once filtered through a set of complex machine-learning-based algorithms, transportation data analytics can be used to analyze trips from the moment journeys begin to the moment they end, via any mode, on all roads and paths. But not all transportation data is created equal, and not all data leads to powerful insights.


Key questions for evaluating data sets and providers include:

  1. How big is the sample size? The larger the sample size, the lower the margin for error in the data.
  2. How many data sources? The most accurate and unbiased data sets draw from multiple sources. 
  3. How frequently is the data updated? Regular updates allow for more granularity in studies.
  4. What does the data cover? Ideally it should be able to drill down to rural areas, small streets, and individual intersections. And it should capture historical travel data.
  5. What modes does the data include? Can it identify cyclists, pedestrians, transportation network company driving, transit, and more
  6. Does the data include trip- and traveler-specific detail? Look for data sets that include demographic information, trip purpose, visitor information, and more.
  7. Does the data offer date-specific measurement? Specifying dates allows the data to measure movements during historical events, and create before-and-after studies.
  8. How is the data accessed? Look for an on-demand platform with access to run multiple studies, rather than a one-time download of a single analysis.

Algorithms and Machine Learning

Transportation data analytics rely on computer algorithms and, sometimes, on machine learning. Software engineering and data science expertise is increasingly important for understanding and evaluating transportation data sources. 

At a minimum, transportation data providers should be able to explain the modeling behind a transportation data algorithm, including the data sources, how the data is handled, and the algorithm’s capabilities. Transparency is critical for evaluating today’s complex data sets. 

Machine learning is an increasingly important element of transportation data analytics, and one that does not offer as much clarity as a computer algorithm. With machine learning, data scientists essentially “feed” a computer program actual data, and the computer “learns” to recognize and extract only that type of data and select it from a data set. Over time, the machine’s accuracy grows, although transparency decreases into what details the machine identifies, and how it evaluates them.


Data scientists may not explain in detail how this process works (in order to protect intellectual property), but they should be able to share the degree of accuracy with which the process works. Sometimes the data will be more directional than definitive which can still be helpful, as long as users understand where the gaps are.

Overall, a company with an effective data set and process should have multiple proven uses with actual customers for their metrics, and not simply theoretical applications. 

Privacy Protection

Stakeholders sometimes have very real concerns about this level of transportation data analytics how data is collected, protected, and shared. Fortunately, best practices are emerging to ensure privacy protection. At StreetLight, we operate at or above established guidelines to set the tone for the industry. 

Data should never enable the tracking of individuals, or sending marketing messages targeted to individual devices (such as cellphones). Instead, analytics should describe patterns in the movement of composite groups of people. 

Transportation data analytics companies should not receive, process, or use personally identifiable information in the creation of customer products. Throughout the product-creation process, they should employ multi-step, multi-layered technical safeguards, including automated privacy and coverage checks that ensure sufficient aggregation based on dimensions such as time, space, and land use.

Data storage and processing should take place in a secure data repository protected by multi-layered network security architecture, and supported by system audits and controls. An additional step is to build in administrative safeguards and employee training.

Currently, the General Data Protection Regulation (GDPR) created by the European Union is stricter than U.S. privacy law, and therefore many data companies choose to follow the GDPR. Some also follow “Privacy by Design” practices, building privacy practices into their technical infrastructure and business operations.

Validating Data 

Validation is a critical step that confirms a transportation data set’s accuracy. Transportation data should be validated against an existing data set that has been confirmed for accuracy. In most cases, this is data from road sensors or counters.


Validated travel model results can also be used to confirm the accuracy of transportation data analytics, as can household travel surveys and U.S. Census data. Multiple validations can also be used to support the accuracy of a single analysis. 

Overall, look for data that is: 

  1. Anonymized
  2. Privacy-protected
  3. Carefully stored and managed
  4. Validated
  5. Proven successful in real-world applications

A powerful data set isn’t a magic solution to every question or problem that planners and managers face, but it is a flexible multi-tool in the transportation toolbox. It can facilitate quick testing of a hypothesis; support and feed existing data sources like modeling and sensors; provide facts to inform public discussion or opinion; support feed factoring and expansion; and more. 

Adding Transportation Analytics to Traditional Methods 

In gathering and analyzing mobility data, traditional methods have always had certain limitations. The gaps are widening between traditional methods and transportation data analytics as the pace of change quickens and new modes arrive. 


The traditional way to gather traffic volume data is to send staff onto a handful of targeted roadways either to manually count vehicles, or to install temporary or permanent “tube” sensors across the roadway to capture counts for the vehicles that drive over it.


Transportation experts are well-acquainted with the limitations of sensor-collected data, which include: 

  • Lower-trafficked and rural roads are often overlooked, which can skew the data. 
  • Sending staff onto busy roadways is dangerous to workers and distracts drivers. 
  • Small sample sizes can skew modeled results.
  • Temporary counters can drive inaccurate results, particularly during COVID-19 travel restrictions.
  • Permanent counters are expensive to install and maintain.


Traffic studies often include survey data, asking respondents questions about their travel routes and habits. But surveys increasingly fall short in gathering sufficient data:

  • Surveys can be expensive, costing hundreds of dollars per household.
  • Results are based on small sample sizes (often around 1% or less) and small sample periods (usually 1-5 days).
  • Participants are increasingly difficult to recruit due to increased privacy concerns, fewer households using landline phones, and COVID-19 travel restrictions.
  • Hard-to-reach populations are systematically under-sampled.
  • Individuals/households tend to underreport travel, especially for short trips, active mode, and non-work purposes. 
  • Error can be introduced via the weighting and expansion process.

Overall, surveys are more powerful tools for gathering subjective, rather than objective, data. 


Data obtained from sensors and surveys have long provided transportation professionals with the necessary inputs for data modeling. Modelers assist planners by developing quantitative analyses that can create short- and long-term travel demand forecasts. 

Historically, information to develop and validate models has been limited by availability, frequency, or acquisition costs and time. Desktop analytics make acquiring the necessary data much easier.

  • Transportation data analytics offer an up-to-date and easy-to-use data source for improving, calibrating, and validating models.
  • Modeling can sometimes be replaced entirely with transportation data analytics. 
  • Transportation data analytics can serve as building blocks to develop simplified models on limited resources.
  • Agencies can use historical transportation data to model before-and-after scenarios to evaluate a project’s success. 

In the past, we’ve accepted the limitations of traditional methods because, before transportation data analytics, these methods were the best we had. 

About StreetLight Data

StreetLight Data pioneered the use of transportation data analytics to help transportation professionals solve their biggest problems. Applying proprietary machine-learning algorithms to over four trillion spatial data points over time, StreetLight uses Location-Based Services data from mobile devices to measure diverse travel patterns and makes them available on-demand via the world’s first SaaS platform for mobility, StreetLight InSight®. From identifying sources of congestion to optimizing new infrastructure to planning for autonomous vehicles, StreetLight powers more than 6,000 global projects every month. 

Talk to Us

Talk to us about Transportation Data Analytics