I have a traffic-flows dataset that shows lat-long points with sequences for multiple trips thru various zones within a city.
Data: Trip data with columns: 'TripID', 'Sequence', 'Zone'
pd.DataFrame({'TripID': ['Tom', 'Tom', 'Tom', 'Krish', 'Krish'], 'Sequence': [1, 2, 3, 1, 2], 'Zone': ['A', 'B', 'C', 'A', 'B']})
Output: I want to derive number of trips between each origin(O) and destination(D) zone. So the output dataframe should have trip_counts between each O-D pair.
Output steps:
- I first created OD matrix with all possible zone pairs
pairs =[(a, b) for a in OD for b in OD]
ODMatrix = pd.DataFrame(pairs, columns = ['Origin','Dest'])
I need to populate the trip_counts as a dataframe column for these pairs,
- Now this is what I need help with; The input dataframe (trip_flows) has multiple unique trips(TripID). Each unique trip goes from zone A to B to C to D to E.. I need to count each of such trip pair as a seperate trip, and this sequence is in 'Sequence' attribute. So we have to iterate for each 'unique trip(TripID) group', append trip-counts while identifying the OD zones in the pair.
Using groupby, for each unique-trip (TripID), I generated a sequence of trip zones. Maybe then set a 0 count for trip_counts in ODMatrix dataframe for each pair, and append each trip with 1. So the final Output should look like this:
pd.DataFrame({'Origin': ['A', 'B'], 'Dest': ['B', 'C'], 'trip_count': [2, 1]})
CodePudding user response:
Firstly, create an array of unique TripIDs to iterate over the original_df, to handle grouped zone pairs as per the sequence.
Array: tripIDs=df['TripID'].unique()
Loop over array items to form the zone trips array:
trips = []
for tripID in tripIDs:
data = df[df["TripID"]==tripID]
data=data.sort_values(by=["Sequence"])
arr = data[['Zone']]
df1=arr.drop(arr.tail(1).index)
df1.reset_index(drop=True, inplace=True)
df2=arr.drop(arr.head(1).index)
df2.reset_index(drop=True, inplace=True)
trip_pair = pd.DataFrame(columns=['ZPair'])
trip_pair['ZPair']= df1['Zone'] df2['Zone']
trips.append(trip_pair.to_numpy())
Append the trip_pair with every output from the loop and count unique value counts:
all_trips=np.vstack(trips)
(unique, counts) = np.unique(all_trips, return_counts=True)
all_trips=np.array(list(zip(unique, counts)))
all_trips = pd.DataFrame(all_trips, columns = ['ZPair','counts'])
Lastly, convert the value counts array to dataframe and join it with the OD Matrix dataframe.
OD=df_orig['Zone'].unique()
pairs =[(a, b) for a in OD for b in OD]
ODMatrix = pd.DataFrame(pairs, columns = ['Origin','Dest'])
ODMatrix['ZPair']=ODMatrix['Origin'] ODMatrix['Dest']
ODMatrix=ODMatrix.merge(all_trips,on='ZPair',how='left')
ODMatrix.fillna(0, inplace=True)
ODMatrix.head()