How to create a for loop with multiple constraints while appending to a new df?-CodePudding

I tried to create a for loop with if statements that would append to a new df, but it's not working out. I am very new to this.

This code is my try at an algorithm that decides whether ride is being shared or not, while following a number of constraints.

match = []
for all rows in d12: 
    if d12.loc[d12['time_min']]-5 < d12['time_min'] < d12['time_min']  5:
        continue
    else:
            pass
            if d12.loc[df11['distance_km']]-1 < d12['distance_km'] < d12['distance_km'] 1: 
                continue 
            else:
                    pass 
                    if df12.loc[d12[sum['passenger_count']] <= 5: 
                        match.append()
                    else:
                            pass

So, it needs to go through all the rows and find matches based on constrains. Everytime a match is found between 2 rows, both of these rows that matched go into a dataframe called match[] with all of the related columns. If one of the constraints is not meant, it should move on to a next row.

Once it's done, these matches are deleted from the dataframe d12.

Constraints explained in detail:

1. time_min between both trips needs to be less or equal to 5 minutes. 
2. distance_km between both trips has to be  /-1km
3. A sum of passenger_count of two trips that are being combined has to be less or equal to 5. 
4. A match can only be combined with two rows of the data.

Data example:

ID	time_min	distance_km	passenger_count
1	450	0.3	2
2	453	0.75	1
3	564	1.35	4
4	600	1.25	1
5	560	0.80	1

CodePudding user response：

Try using a custom function to find possible matches. Then filter your DataFrame by whether a match is found or not:

def find_match(row, df):
    other = df.drop(row.name)
    
    match = other[(other["time_min"].between(row["time_min"]-5,row["time_min"] 5, inclusive="both")) &
                  (other["distance_km"].between(row["distance_km"]-1,row["distance_km"] 1, inclusive="both")) &
                  (other["passenger_count"] row["passenger_count"]<=5)]
    
    if match.shape[0]>0:
        return match["ID"].iat[0]
    return None

df["Match ID"] = df.apply(lambda row: find_match(row, df), axis=1)

match = df[df["Match ID"].notnull()].drop("Match ID", axis=1)
singles = df[df["Match ID"].isnull()].drop("Match ID", axis=1)

>>> match
   ID  time_min  distance_km  passenger_count
0   1       450         0.30                2
1   2       453         0.75                1
2   3       564         1.35                4
4   5       560         0.80                1

>>> singles
   ID  time_min  distance_km  passenger_count
3   4       600         1.25                1

CodePudding user response：

I approached this in a natural way, and comparison logic can be easily customized:

import pandas as pd

df = pd.DataFrame({
    'time_min': [450, 453, 564, 600, 560],
    'distance_km': [0.3, 0.75, 1.35, 1.25, 0.8],
    'passenger_count': [2, 1, 4, 1, 1]
}, index=[1, 2, 3, 4, 5])

match = pd.DataFrame()

for idx1, first in df.iterrows():
    for idx2, second in df.iterrows():
        if idx1 <= idx2: # because a,b == b,a
            continue

        # your comparison logic goes here:
        # 1. time_min between both trips needs to be less or equal to 5 minutes. 
        # 2. distance_km between both trips has to be  /-1km
        # 3. A sum of passenger_count of two trips that are being combined has to be less or equal to 5. 
        # 4. A match can only be combined with two rows of the data.
        if (
            abs(first['time_min'] - second['time_min']) <= 5
            and abs(first['distance_km'] - second['distance_km']) <= 1
            and first['passenger_count']   second['passenger_count'] <= 5
        ):
            result = pd.DataFrame.from_dict([first, second])
            match = pd.concat([match, result])
print(match)