Iterate over a dataframe to populate counts in another-CodePudding

I have a traffic-flows dataset that shows lat-long points with sequences for multiple trips thru various zones within a city.

Data: Trip data with columns: 'TripID', 'Sequence', 'Zone'

pd.DataFrame({'TripID': ['Tom', 'Tom', 'Tom', 'Krish', 'Krish'], 'Sequence': [1, 2, 3, 1, 2], 'Zone': ['A', 'B', 'C', 'A', 'B']})

Output: I want to derive number of trips between each origin(O) and destination(D) zone. So the output dataframe should have trip_counts between each O-D pair.

Output steps:

I first created OD matrix with all possible zone pairs

pairs =[(a, b) for a in OD for b in OD] ODMatrix = pd.DataFrame(pairs, columns = ['Origin','Dest'])

I need to populate the trip_counts as a dataframe column for these pairs,

Now this is what I need help with; The input dataframe (trip_flows) has multiple unique trips(TripID). Each unique trip goes from zone A to B to C to D to E.. I need to count each of such trip pair as a seperate trip, and this sequence is in 'Sequence' attribute. So we have to iterate for each 'unique trip(TripID) group', append trip-counts while identifying the OD zones in the pair.

Using groupby, for each unique-trip (TripID), I generated a sequence of trip zones. Maybe then set a 0 count for trip_counts in ODMatrix dataframe for each pair, and append each trip with 1. So the final Output should look like this:

pd.DataFrame({'Origin': ['A', 'B'], 'Dest': ['B', 'C'], 'trip_count': [2, 1]})

CodePudding user response：

Firstly, create an array of unique TripIDs to iterate over the original_df, to handle grouped zone pairs as per the sequence. Array: tripIDs=df['TripID'].unique() Loop over array items to form the zone trips array:

trips = []
for tripID in tripIDs:
    data = df[df["TripID"]==tripID]
    data=data.sort_values(by=["Sequence"])
    arr = data[['Zone']]
    df1=arr.drop(arr.tail(1).index)
    df1.reset_index(drop=True, inplace=True)
    df2=arr.drop(arr.head(1).index)
    df2.reset_index(drop=True, inplace=True)
    trip_pair = pd.DataFrame(columns=['ZPair'])
    trip_pair['ZPair']= df1['Zone'] df2['Zone']
    trips.append(trip_pair.to_numpy())

Append the trip_pair with every output from the loop and count unique value counts:

all_trips=np.vstack(trips)
(unique, counts) = np.unique(all_trips, return_counts=True)
all_trips=np.array(list(zip(unique, counts)))
all_trips = pd.DataFrame(all_trips, columns = ['ZPair','counts'])

Lastly, convert the value counts array to dataframe and join it with the OD Matrix dataframe.

OD=df_orig['Zone'].unique()
pairs =[(a, b) for a in OD for b in OD]
ODMatrix = pd.DataFrame(pairs, columns = ['Origin','Dest'])
ODMatrix['ZPair']=ODMatrix['Origin'] ODMatrix['Dest']
ODMatrix=ODMatrix.merge(all_trips,on='ZPair',how='left')
ODMatrix.fillna(0, inplace=True)
ODMatrix.head()