I tried to create a for loop with if statements that would append to a new df, but it's not working out. I am very new to this.
This code is my try at an algorithm that decides whether ride is being shared or not, while following a number of constraints.
match = []
for all rows in d12:
if d12.loc[d12['time_min']]-5 < d12['time_min'] < d12['time_min'] 5:
continue
else:
pass
if d12.loc[df11['distance_km']]-1 < d12['distance_km'] < d12['distance_km'] 1:
continue
else:
pass
if df12.loc[d12[sum['passenger_count']] <= 5:
match.append()
else:
pass
So, it needs to go through all the rows and find matches based on constrains. Everytime a match is found between 2 rows, both of these rows that matched go into a dataframe called match[] with all of the related columns. If one of the constraints is not meant, it should move on to a next row.
Once it's done, these matches are deleted from the dataframe d12.
Constraints explained in detail:
1. time_min between both trips needs to be less or equal to 5 minutes.
2. distance_km between both trips has to be /-1km
3. A sum of passenger_count of two trips that are being combined has to be less or equal to 5.
4. A match can only be combined with two rows of the data.
Data example:
ID | time_min | distance_km | passenger_count |
---|---|---|---|
1 | 450 | 0.3 | 2 |
2 | 453 | 0.75 | 1 |
3 | 564 | 1.35 | 4 |
4 | 600 | 1.25 | 1 |
5 | 560 | 0.80 | 1 |
CodePudding user response:
Try using a custom function to find possible matches. Then filter your DataFrame by whether a match is found or not:
def find_match(row, df):
other = df.drop(row.name)
match = other[(other["time_min"].between(row["time_min"]-5,row["time_min"] 5, inclusive="both")) &
(other["distance_km"].between(row["distance_km"]-1,row["distance_km"] 1, inclusive="both")) &
(other["passenger_count"] row["passenger_count"]<=5)]
if match.shape[0]>0:
return match["ID"].iat[0]
return None
df["Match ID"] = df.apply(lambda row: find_match(row, df), axis=1)
match = df[df["Match ID"].notnull()].drop("Match ID", axis=1)
singles = df[df["Match ID"].isnull()].drop("Match ID", axis=1)
>>> match
ID time_min distance_km passenger_count
0 1 450 0.30 2
1 2 453 0.75 1
2 3 564 1.35 4
4 5 560 0.80 1
>>> singles
ID time_min distance_km passenger_count
3 4 600 1.25 1
CodePudding user response:
I approached this in a natural way, and comparison logic can be easily customized:
import pandas as pd
df = pd.DataFrame({
'time_min': [450, 453, 564, 600, 560],
'distance_km': [0.3, 0.75, 1.35, 1.25, 0.8],
'passenger_count': [2, 1, 4, 1, 1]
}, index=[1, 2, 3, 4, 5])
match = pd.DataFrame()
for idx1, first in df.iterrows():
for idx2, second in df.iterrows():
if idx1 <= idx2: # because a,b == b,a
continue
# your comparison logic goes here:
# 1. time_min between both trips needs to be less or equal to 5 minutes.
# 2. distance_km between both trips has to be /-1km
# 3. A sum of passenger_count of two trips that are being combined has to be less or equal to 5.
# 4. A match can only be combined with two rows of the data.
if (
abs(first['time_min'] - second['time_min']) <= 5
and abs(first['distance_km'] - second['distance_km']) <= 1
and first['passenger_count'] second['passenger_count'] <= 5
):
result = pd.DataFrame.from_dict([first, second])
match = pd.concat([match, result])
print(match)