The more I work with StreetLight Data’s location-based services (LBS) data set, the more I realize that it is the data source the transportation industry has been waiting for – and that it deserves. Over the past few months, LBS data has emerged as a resource with all the benefits of cellular data, but without its limitations. LBS data can answer a huge array of travel questions that fill in the long-standing information gaps for the transportation industry, especially when used in combination with navigation-GPS data.
But since it’s so new, there’s very little information available to planners about its value today. We’re working to correct that with a series of blog posts that zero in on a different aspect of LBS data – and this is the first. In this post, I’ll highlight LBS data’s spatial precision.
But before we go too far down the spatial precision rabbit hole, let’s define LBS data. We obtain this data from our partner Cuebiq, which provides pieces of software (SDKs) to mobile app developers. These SDKs allow their apps to use “Location-Based Services” – essentially, the service the app provides depends on the users’ physical location. These include apps for weather, shopping, dating, navigation, and more.
As expected, these mobile apps collects anonymous user locations when they operate in the foreground. In the background, they also collect locations whenever the device begins moving. The SDKs collect data with WiFi proximity, A-GPS and a few other technologies.
How LBS Data and Cellular Data Are Similar
LBS data has many of the same beneficial characteristics as “cellular” triangulation data. First, it has a large sample size. Today, our LBS sample represents roughly 12% of the adult US population, and that is growing over time. When we first started using this data source, it was about 10%. This is comparable to smaller telecommunications providers’ share of the market.
Second, the data records have a unique, hashed ID that stays persistent over many weeks. This allows us to develop analytics such as income, trip purpose, resident status, and home region for devices. These types of Metrics are extremely valuable for transportation planners, particularly for equity assessments and travel demand modeling.
Spatial Precision: LBS Data vs. Cellular Data
In the image below, we’ve illustrated the relative spatial precision of different data records using circles on a map. Each circle represents the range of places where a device could be at different meters of spatial precision. The best spatial precision shown below is 5 meters, and the worst is 1000 meters.
LBS data’s spatial precision varies based on which technology was used to collect the locations. It ranges from 5 meters – that’s if the GPS chip was on – to 50+ meters for records created using WiFi proximity, Bluetooth proximity, etc. In our sample, the average spatial precision for LBS is about 20 meters. We use a variety of algorithmic processing techniques to remove any records with spatial precision that falls below our standards.
As shown in the figure above, with 5 meter spatial precision you can know a device whether a device is on a small road or parking lot. 10 to 20 meters is still precise enough for most parking lots, TAZs, blocks, etc. But at 300 to 1000 meters, the average precision range for cellular multilateration, is just too big to know what TAZ, census block, or road a device is located on.
This has several negative impacts. The obvious one is that your zone structure for OD has to be more aggregate, which may or may not be a problem for you depending on the problem at hand. It also means you can’t do “select link” analyses well.
But there are also some deeper problems. First, short trips, for example 500-meter to trips to grab something from the corner store, will be missed. That covers most pedestrian and some bike trips. Second, expansion (i.e.: use the sample to estimate the exact count of trips that take place) is inherently less accurate. That’s because the identification of a devices’ home block (the core for expansion) is less precise.
Lastly, mode inference is much harder because inferring detailed speed is not feasible with such imprecise spatial precision. Developing accurate Metrics for bike or pedestrian modes from cellular data would be extremely difficult or impossible.
Putting It All Together
To sum it all up, the spatial precision of LBS data offers a ton of significant advantages as compared to cellular. It allows for:
- More granular analysis zones
- Analysis of short trips
- Expansion to estimated counts
- Mode inference