Vehicle Utilization Classification
It can be useful to automatically classify driving behavior to determine whether the vehicle is used primarily as a commuting vehicle or consistently used during the day as a vehicle for hire
How often vehicles are used and when they are used
gdf_tz = gdf.sjoin(tz[['geometry','tz_name1st']])
gdf_tz['timestamp_local'] = gdf_tz.apply(lambda row: row['timestamp'].tz_convert(row['tz_name1st']), axis=1)
gdf_tz['day_of_week'] = gdf_tz['timestamp_local'].map(lambda x: x.dayofweek)
gdf_tz['hour_of_day'] = gdf_tz['timestamp_local'].map(lambda x: x.hour)
gb = pd.DataFrame(gdf_tz.loc[gdf_tz['haversine_dist_shift']>0].groupby(['day_of_week','hour_of_day'])['haversine_dist_shift'].sum())
gb = pd.DataFrame(gdf_tz.loc[gdf_tz['haversine_dist_shift']>0].groupby(['day_of_week','hour_of_day']).agg({'haversine_dist_shift':[np.sum,np.mean],'timestamp_diff_second':[np.sum,np.mean]}))
gb = gb.reset_index(drop = False)
gb.columns = [i[0] + '_' + i[1] if i[1] != '' else i[0] for i in gb.columns]
temp_lst = []
for day in [0,1,2,3,4,5,6]:
temp = gb.loc[gb['day_of_week'] == day]
all_time = pd.DataFrame(index = np.arange(0,24,1))
all_time['odometer_diff_fill'] = 0
all_time['time_fill'] = 0
temp = temp.merge(all_time, left_on = 'hour_of_day', right_index = True, how = 'right')
temp['filled_dist_sum'] = temp['haversine_dist_shift_sum']
temp['filled_time_sum'] = temp['timestamp_diff_second_sum']
temp['filled_dist_mean'] = temp['haversine_dist_shift_mean']
temp['filled_time_mean'] = temp['timestamp_diff_second_mean']
temp['day_of_week'] = day
if day < 5:
temp['weekday'] = True
else:
temp['weekday'] = False
temp_lst.append(temp)
weekly_dst = pd.concat(temp_lst)
weekly_dst['filled_dist_sum'] = weekly_dst['haversine_dist_shift_sum'].combine_first(weekly_dst['odometer_diff_fill'])
weekly_dst['filled_time_sum'] = weekly_dst['timestamp_diff_second_sum'].combine_first(weekly_dst['time_fill'])
weekly_dst['filled_dist_mean'] = weekly_dst['haversine_dist_shift_mean'].combine_first(weekly_dst['odometer_diff_fill'])
weekly_dst['filled_time_mean'] = weekly_dst['timestamp_diff_second_mean'].combine_first(weekly_dst['time_fill'])With this setup, the percent of weekday driving miles and hours that take place during commuting hours (7am to 9am and 4pm to 7pm). If that fraction is sufficiently high, the vehicle is probably primarily used for commuting.
Using the above code, the averaged distribution of driving during a week can be plotted as a heatmap (above) or a bar graph (below)
And here is an example heatmap

Note that this type of heatmap (and the below bar graph) can easily be created for fleet-level statistics as well. If weekly_dst contains data for multiple vehicles, the data to be plotted can be obtained using a groupby as in:
For 100 example vehicles, the activity heatmap looks like this:

And here is an example bar graph of the distribution of driving kilometers averaged over each weekday

Last updated