Sample size is not a simple concept when it comes to Massive Mobile Data Analytics. In this post, we’re analyzing our commercial vehicle data’s sample size across the U.S., updating a 2016 study we performed, based on a Daily Trip Sample Ratio. In short, our archival data captures ~18% of commercial vehicle trips that took place across the 881 permanent counters to which we compared our data.
Daily Trip Sample Size
For this analysis, we chose the Daily Trip Sample Ratio as the unit of sample analysis because “trips” are the active unit for most of our clients’ projects. Therefore, we consider it a “useful” measure of sample (compared to pings, for example).
First, we collected truck data from the Federal Highway Administration (FHWA). We used a high-quality sample of 881 permanent counters across the United States with robust counts, representing over 25,000 hourly observations per month across 2019 (see Figures 1 and 2 below). This data set gave us the estimated Heavy Duty and Medium Duty truck trips that pass each location each day. For each day available, we counted the trips captured by StreetLight Data, and divided them by the permanent counter’s observation, for both Heavy Duty and Medium Duty vehicles.
Figure 2: FHWA comparison counter sample size by state.
On average, StreetLight’s sample size represents 18% of the trips made across these stations. This was higher for Medium Duty trucks (31% of trips captured), and lower for Heavy Duty trucks (11%). StreetLight’s trip count also correlated extremely well with the permanent counters with an R2 of .91, with similar values for both Heavy and Medium Duty trucks. We consider this a very strong result.
Rural vs. Urban Regions
It’s important to know whether the Daily Trip Sample Ratio was consistent across rural and urban regions, or between state jurisdictions. We classified regions as low, medium, and high density depending on the population per square kilometer.
The results held up quite consistently, with Daily Trip Sample Ratios from 15% (low-density regions) to 22% (high-density regions). Figure 3 (below) is a map of the commercial truck penetration rate across the U.S. – it is good to see a consistent rate across geographies. Looking at this by state, results ranged from a penetration rate of 10% to 30%.
Of course, this is just one approach, and there are several other approaches that we could look at in the future. For example, we could also explore how sample size holds up hourly – this could be important when looking to measure rush hour, congestion, speed reductions during heavy traffic hours, and more.