Home > front end >  Merge 2 data frames with multiple conditions
Merge 2 data frames with multiple conditions

Time:05-18

I have two dataframes:

Data frame 1:

df = pd.DataFrame(np.array([
[5, 'a3789', 6000, 2.03]
[7, 'b3789', 1005, 2.05], 
[7, 'c3789', 2598, 2.05], 
[5, 'd3789', 5500, 2.05], 
[5, 'e3789', 1400, 2.03]]),
columns=['numP', 'id', 'value', 'percent']

Dataframe 2:

df_s3_data = pd.DataFrame(np.array([
[3.25,3.25,2.05,22.18,5,1000,2000],
[3.25,3.25,2.03,21.90,5,1000,2000], 
[3.25,3.25,2.01,21.62,5,1000,2000], 
[3.75,3.75,2.05,22.18,5,2000,3000], 
[3.75,3.75,2.03,21.90,5,2000,3000], 
[3.75,3.75,2.01,21.62,5,2000,3000], 
[4.25,4.25,2.05,22.11,5,3000,1000000],
[4.25,4.25,2.03,21.83,5,3000,1000000], 
[4.25,4.00,2.01,21.68,5,3000,1000000], 
[3.50,3.25,2.05,22.19,7,1000,2000],
[3.50,3.25,2.03,21.91,7,1000,2000], 
[3.50,3.25,2.01,21.63,7,1000,2000], 
[4.00,4.00,2.05,22.22,7,2000,3000],
[4.00,4.00,2.03,21.94,7,2000,3000],
[4.00,4.00,2.01,21.67,7,2000,3000],
[4.75,4.75,2.05,22.18,7,3000,1000000],
[4.75,4.75,2.03,21.90,7,3000,1000000],
[4.75,4.75,2.01,21.63,7,3000,1000000]]),
columns=['Flat', 'Difer', 'Origin', 'Efetive', 'Prazo', 'Ticket-Inf', 'Ticket-Sup'])

I need to get a new columns on df with the values from df_s3_data 'Flat' accordingly to the rules:

1 - df['numP'] = df_s3_data['Prazo']

2 - df['percent'] = df_s3_data['Origin']

3 - df['value'] >= df_s3_data['Ticket-Inf']

4 - df['value'] < df_s3_data['Ticket-Sup']

The results would be a columns on df with [4.25, 3.50, 4.00, 4.25, 3.25]

I tried the lambda function below:

df['Flat'] = df.apply(lambda x: df_s3_data.loc[df_s3_data['Prazo'] == x['numP'] & df_s3_data['Origin'] == x['percent'] & x['value'] >= df_s3_data['Ticket-Inf'] & x['value'] < df_s3_data['Ticket-Sup'], df_s3_data['Flat']])

And a few merge tries, but didn't succeed.

It should work like an excel sumif with unique references.

Can you guys help me out?

Expected result is a data frame like:

df = pd.DataFrame(np.array([
                            [5, 'a3789', 6000, 2.03, 4.25]
                            [7, 'b3789', 1005, 2.05, 3.50],
                            [7, 'c3789', 2598, 2.05, 4.00],
                            [5, 'd3789', 5500, 2.05, 4.25],
                            [5, 'e3789', 1400, 2.03, 3.25]]),
                            columns=['numP', 'id', 'value', 'percent','Flat']

CodePudding user response:

IIUC, you can first perform a left merge per your first two conditions and then filter the resulting dataframe for the last two conditions:

df = df.astype({"numP": "float", "value": "float", "percent": "float"})
merged = df.merge(df_s3_data, left_on=["numP","percent"],right_on=["Prazo","Origin"],how="left")

output = merged[merged["value"].ge(merged["Ticket-Inf"]) & merged["value"].lt(merged["Ticket-Sup"])]

>>> output
    numP     id   value  percent  ...  Efetive  Prazo  Ticket-Inf  Ticket-Sup
2    5.0  a3789  6000.0     2.03  ...    21.83    5.0      3000.0   1000000.0
3    7.0  b3789  1005.0     2.05  ...    22.19    7.0      1000.0      2000.0
7    7.0  c3789  2598.0     2.05  ...    22.22    7.0      2000.0      3000.0
11   5.0  d3789  5500.0     2.05  ...    22.11    5.0      3000.0   1000000.0
12   5.0  e3789  1400.0     2.03  ...    21.90    5.0      1000.0      2000.0

[5 rows x 11 columns]
  • Related