latitude and longitude

Location on the Earth

High-Level Explanation

Latitude and Longitude using the WGS84 latitude-longitude projection (see e.g. https://geopandas.org/en/stable/docs/user_guide/projections.html) from the GPS system.

Enables

  • Distance

    • Distance between successive points

    • This code transforms the GPS points from the coordinate reference system (CRS) of WGS84 into a meter-based projection (e.g. https://epsg.io/6340 is appropriate for California and nearby states and https://epsg.io/32118 is appropriate for New York, as used below).

      import geopandas as gpd
      
      gdf['dist_to_prev_pt'] = gdf['geometry'].to_crs('EPSG:32118').distance(gdf['geometry'].shift(1).to_crs('EPSG:32118'))
      gdf['dist_to_prev_pt_shift'] = gdf['dist_to_prev_pt'].shift(1)
    • Using the Haversine distance between points along with the difference in time between those points, the speed can be estimated.

  • Location types

  • Map Matching (must also have timestamp)

    • Open-source map matching (such as Valhalla) or paid services (such as MapBox) can be used to "snap" the GPS coordinates to roadways.

    • MapBox Example: First, the latitudes and longitudes must be formatted in a way that the MapBox endpoint will accept. Note that [::4] takes every fourth coordinate, as MapBox accepts at most 100 coordinates at a time and giving coordinates roughly every 20*4 = 80 seconds is generally sufficient for map matching to the road network. (Note that a MapBox api key is required). See the code appendix below for the decode function.

      • import requests
        
        lngs = gdf['longitude'].tolist()[::4]
        lats = gdf['latitude'].tolist()[::4]
        coordinates = ';'.join([str(i) + ',' + str(j) for i,j in zip(lngs,lats)])
        url = f'https://api.mapbox.com/matching/v5/mapbox/driving-traffic/{coordinates}?access_token={api_key}'
        response = requests.get(url)
      • The response contains information about the roads that this set of coordinates matched to. For example:

      • response.json()['matchings'][4]
        > {'confidence': 0.93516,
         'weight_name': 'auto',
         'weight': 222.003,
         'duration': 201.324,
         'distance': 5750.37,
         'legs': [{'via_waypoints': [],
           'admins': [{'iso_3166_1_alpha3': 'USA', 'iso_3166_1': 'US'}],
           'weight': 112.387,
           'duration': 96.623,
           'steps': [],
           'distance': 2720.261,
           'summary': 'US 23 North'},
          {'via_waypoints': [],
           'admins': [{'iso_3166_1_alpha3': 'USA', 'iso_3166_1': 'US'}],
           'weight': 109.616,
           'duration': 104.701,
           'steps': [],
           'distance': 3030.11,
           'summary': 'BL I 94, US 23 North'}],
         'geometry': 'gsv`Gtsw}Ncq@jDsk@_Lew@j@_RRw`@kCi[}a@uJeG}_AO'} 
      • And this geometry decodes to

      • [[i[0]*10,i[1]*10] for i in decode(response.json()['matchings'][4]['geometry'])]
        > [[-83.68459, 42.227880000000006],
         [-83.68544999999999, 42.2359],
         [-83.68337, 42.24304],
         [-83.68359, 42.25203],
         [-83.68369, 42.25507],
         [-83.68299, 42.260470000000005],
         [-83.67739999999999, 42.265],
         [-83.67609, 42.26687],
         [-83.67601, 42.27726]]
      • Further, the sum of meters driven on each of the roadways using

      • lst = []
        for i in range(len(response.json()['matchings'])):
            lst.append(pd.DataFrame(response.json()['matchings'][i]['legs']))
        pd.concat(lst).groupby('summary')['distance'].sum()
        >   BL I 94, US 23 North                                                  3030.110
            Broadway Street                                                         97.029
            I 94 East                                                            23498.362
            I 94 East, I 94 West                                                  9355.541
            I 94 West                                                            25963.642
            I 94 West, I 94 East                                                  7703.051
            John D Dingell Drive                                                  2125.649
            John D. Dingell Drive, John D Dingell Drive                           7828.364
            John D. Dingell Drive, Long-term Parking / International Arrivals      978.600
            Merriman Road                                                         1817.140
            Plymouth Road                                                         1292.209
            Plymouth Road, Devon Circle                                           2753.524
            Plymouth Road, US 23 South                                            6734.600
            US 23 North                                                           2720.261
            US 23 South                                                           5261.977
            W. G. Rogell Drive                                                    1845.754
    • Valhalla Example: To hit the open Valhalla endpoint (see: https://valhalla.github.io/valhalla/api/map-matching/api-reference/). Alternatively, Valhalla can be hosted on a server (see e.g. https://water-gis.com/en/setups/valhalla/setup-valhalla/). At the bottom is a sum of the roads travelled in kilometers. Note that the public Valhalla endpoint seems to limit the length of the output (here, the return trip is truncated off the output of the map matching service).

      • req = gdf[['latitude','longitude']].rename(columns={'latitude':'lat','longitude':'lon'})
        lst = []
        for i in req.index:
            lst.append({"lat":req.loc[i,'lat'],'lon':req.loc[i,'lon'], 'type':'via'})
        
        lst[0]['type'] = 'break'
        lst[-1]['type'] = 'break'
        dct = {"shape" : lst}
        dct['costing'] = 'auto'
        dct['shape_match'] = "map_snap"
        dct['units'] = 'miles'
        
        url = 'https://valhalla1.openstreetmap.de/trace_route'
        
        response = requests.post(url, json = dct)
        
        street_names = []
        lengths = []
        ms = response.json()['trip']['legs'][0]['maneuvers']
        
        for m in ms:
            try:
                street_names.append(m['street_names'])
                lengths.append(m['length'])
            except KeyError:
                pass
                
        temp = pd.DataFrame()
        temp['street_name'] = street_names
        temp['length'] = lengths
        temp['street_name'] = temp['street_name'].map(lambda x: x[0])
        
        shape_gdf = gpd.GeoDataFrame(decode(response.json()['trip']['legs'][0]['shape']))
        shape_gdf['geometry'] = gpd.geoseries.points_from_xy(x = shape_gdf[0], y = shape_gdf[1])
        
        temp.groupby('street_name')['length'].sum()
        
        > street_name
        Broadway Street                                0.085
        Devon Circle                                   0.233
        Huron Parkway                                  0.625
        I 94 East                                     17.561
        Long-term Parking / International Arrivals     0.496
        Merriman Road                                  2.404
        Plymouth Road                                  3.141
        Traverwood Drive                               0.272
        US 23 South                                    5.290

Enabled By

Known Quirks

  • "Teleportation": Occasionally, jumps in location that are physically impossible occur due to some sort of error in the data collection. As a guide, the Haversine distance between successive points, divide by the time difference, and obtain a rough upper bound for the speed the vehicle would have to travel between the successive points in the given time. 75 meters per second is used as the sanity check for this derived_speed (see code appendix and example below).

    • This may be due to a GPS connection error, as there are many instances where the reported speed is greater than 0 but the latitude and longitude do not change.

  • Sign error: On rare occasion, the sign of latitude has been flipped, resulting in coordinates that take truly place in the United States appearing as though they take place in Asia.

Cases

  • SmartCar

    • Data points are captured every 4 hours, so trips will generally not be captured

  • AutoPi

    • Data points are generally captured every 20 seconds of vehicle operation

  • Tesla

    • Data points are generally captured every 60 seconds while the vehicle is on or charging, but are sometimes more frequent.

Code Appendix

Note that odometer is not available for every vehicle, nor every record for every vehicle, so the Haversine distance between successive points can be used as a rough lower bound of distance travelled.

Visualizations with Explanations

GPS coordinates on a map color coded by the distance between successive points
GPS coordinates on a map color coded by the US Census population density (C: micropolitan , R: Rural, U: Urban)
Example SmartCar data
Histogram of time between Tesla records for an example vehicle
Histogram of time between AutoPi records for an example vehicle
Map matched coordinates (red: snapped coordinated, blue and green: original coordinates, where this trip starts in blue and ends in green)
The blue coordinates (Valhalla map matched data) going towards the airport but the return trip is not captured (red: original coordinates)
Example of "Teleportation", color coded by derived_speed
Example of "Teleportation", color coded by vehicle-reported speed

Last updated