latitude and longitude
Location on the Earth
High-Level Explanation
Latitude and Longitude using the WGS84 latitude-longitude projection (see e.g. https://geopandas.org/en/stable/docs/user_guide/projections.html) from the GPS system.
Enables
Distance
Distance between successive points
This code transforms the GPS points from the coordinate reference system (CRS) of WGS84 into a meter-based projection (e.g. https://epsg.io/6340 is appropriate for California and nearby states and https://epsg.io/32118 is appropriate for New York, as used below).
import geopandas as gpd gdf['dist_to_prev_pt'] = gdf['geometry'].to_crs('EPSG:32118').distance(gdf['geometry'].shift(1).to_crs('EPSG:32118')) gdf['dist_to_prev_pt_shift'] = gdf['dist_to_prev_pt'].shift(1)Using the Haversine distance between points along with the difference in time between those points, the speed can be estimated.
Location types
Spatial joins to relevant localities, such as time zone, states, and population density (see https://www.census.gov/geographies/mapping-files/time-series/geo/tiger-line-file.html for some relevant US Census shapefiles).
Map Matching (must also have timestamp)
Open-source map matching (such as Valhalla) or paid services (such as MapBox) can be used to "snap" the GPS coordinates to roadways.
MapBox Example: First, the
latitudesandlongitudesmust be formatted in a way that the MapBox endpoint will accept. Note that[::4]takes every fourth coordinate, as MapBox accepts at most 100 coordinates at a time and giving coordinates roughly every 20*4 = 80 seconds is generally sufficient for map matching to the road network. (Note that a MapBox api key is required). See the code appendix below for thedecodefunction.import requests lngs = gdf['longitude'].tolist()[::4] lats = gdf['latitude'].tolist()[::4] coordinates = ';'.join([str(i) + ',' + str(j) for i,j in zip(lngs,lats)]) url = f'https://api.mapbox.com/matching/v5/mapbox/driving-traffic/{coordinates}?access_token={api_key}' response = requests.get(url)The
responsecontains information about the roads that this set of coordinates matched to. For example:response.json()['matchings'][4] > {'confidence': 0.93516, 'weight_name': 'auto', 'weight': 222.003, 'duration': 201.324, 'distance': 5750.37, 'legs': [{'via_waypoints': [], 'admins': [{'iso_3166_1_alpha3': 'USA', 'iso_3166_1': 'US'}], 'weight': 112.387, 'duration': 96.623, 'steps': [], 'distance': 2720.261, 'summary': 'US 23 North'}, {'via_waypoints': [], 'admins': [{'iso_3166_1_alpha3': 'USA', 'iso_3166_1': 'US'}], 'weight': 109.616, 'duration': 104.701, 'steps': [], 'distance': 3030.11, 'summary': 'BL I 94, US 23 North'}], 'geometry': 'gsv`Gtsw}Ncq@jDsk@_Lew@j@_RRw`@kCi[}a@uJeG}_AO'}And this
geometrydecodes to[[i[0]*10,i[1]*10] for i in decode(response.json()['matchings'][4]['geometry'])] > [[-83.68459, 42.227880000000006], [-83.68544999999999, 42.2359], [-83.68337, 42.24304], [-83.68359, 42.25203], [-83.68369, 42.25507], [-83.68299, 42.260470000000005], [-83.67739999999999, 42.265], [-83.67609, 42.26687], [-83.67601, 42.27726]]Further, the sum of meters driven on each of the roadways using
lst = [] for i in range(len(response.json()['matchings'])): lst.append(pd.DataFrame(response.json()['matchings'][i]['legs'])) pd.concat(lst).groupby('summary')['distance'].sum() > BL I 94, US 23 North 3030.110 Broadway Street 97.029 I 94 East 23498.362 I 94 East, I 94 West 9355.541 I 94 West 25963.642 I 94 West, I 94 East 7703.051 John D Dingell Drive 2125.649 John D. Dingell Drive, John D Dingell Drive 7828.364 John D. Dingell Drive, Long-term Parking / International Arrivals 978.600 Merriman Road 1817.140 Plymouth Road 1292.209 Plymouth Road, Devon Circle 2753.524 Plymouth Road, US 23 South 6734.600 US 23 North 2720.261 US 23 South 5261.977 W. G. Rogell Drive 1845.754
Valhalla Example: To hit the open Valhalla endpoint (see: https://valhalla.github.io/valhalla/api/map-matching/api-reference/). Alternatively, Valhalla can be hosted on a server (see e.g. https://water-gis.com/en/setups/valhalla/setup-valhalla/). At the bottom is a sum of the roads travelled in kilometers. Note that the public Valhalla endpoint seems to limit the length of the output (here, the return trip is truncated off the output of the map matching service).
req = gdf[['latitude','longitude']].rename(columns={'latitude':'lat','longitude':'lon'}) lst = [] for i in req.index: lst.append({"lat":req.loc[i,'lat'],'lon':req.loc[i,'lon'], 'type':'via'}) lst[0]['type'] = 'break' lst[-1]['type'] = 'break' dct = {"shape" : lst} dct['costing'] = 'auto' dct['shape_match'] = "map_snap" dct['units'] = 'miles' url = 'https://valhalla1.openstreetmap.de/trace_route' response = requests.post(url, json = dct) street_names = [] lengths = [] ms = response.json()['trip']['legs'][0]['maneuvers'] for m in ms: try: street_names.append(m['street_names']) lengths.append(m['length']) except KeyError: pass temp = pd.DataFrame() temp['street_name'] = street_names temp['length'] = lengths temp['street_name'] = temp['street_name'].map(lambda x: x[0]) shape_gdf = gpd.GeoDataFrame(decode(response.json()['trip']['legs'][0]['shape'])) shape_gdf['geometry'] = gpd.geoseries.points_from_xy(x = shape_gdf[0], y = shape_gdf[1]) temp.groupby('street_name')['length'].sum() > street_name Broadway Street 0.085 Devon Circle 0.233 Huron Parkway 0.625 I 94 East 17.561 Long-term Parking / International Arrivals 0.496 Merriman Road 2.404 Plymouth Road 3.141 Traverwood Drive 0.272 US 23 South 5.290
Enabled By
Known Quirks
"Teleportation": Occasionally, jumps in location that are physically impossible occur due to some sort of error in the data collection. As a guide, the Haversine distance between successive points, divide by the time difference, and obtain a rough upper bound for the speed the vehicle would have to travel between the successive points in the given time. 75 meters per second is used as the sanity check for this
derived_speed(see code appendix and example below).This may be due to a GPS connection error, as there are many instances where the reported
speedis greater than 0 but thelatitudeandlongitudedo not change.
Sign error: On rare occasion, the sign of
latitudehas been flipped, resulting in coordinates that take truly place in the United States appearing as though they take place in Asia.
Cases
SmartCar
Data points are captured every 4 hours, so trips will generally not be captured
AutoPi
Data points are generally captured every 20 seconds of vehicle operation
Tesla
Data points are generally captured every 60 seconds while the vehicle is on or charging, but are sometimes more frequent.
Code Appendix
Note that odometer is not available for every vehicle, nor every record for every vehicle, so the Haversine distance between successive points can be used as a rough lower bound of distance travelled.
Visualizations with Explanations








derived_speed
speedLast updated