I have two dataframes:
Data frame 1:
df = pd.DataFrame(np.array([
[5, 'a3789', 6000, 2.03]
[7, 'b3789', 1005, 2.05],
[7, 'c3789', 2598, 2.05],
[5, 'd3789', 5500, 2.05],
[5, 'e3789', 1400, 2.03]]),
columns=['numP', 'id', 'value', 'percent']
Dataframe 2:
df_s3_data = pd.DataFrame(np.array([
[3.25,3.25,2.05,22.18,5,1000,2000],
[3.25,3.25,2.03,21.90,5,1000,2000],
[3.25,3.25,2.01,21.62,5,1000,2000],
[3.75,3.75,2.05,22.18,5,2000,3000],
[3.75,3.75,2.03,21.90,5,2000,3000],
[3.75,3.75,2.01,21.62,5,2000,3000],
[4.25,4.25,2.05,22.11,5,3000,1000000],
[4.25,4.25,2.03,21.83,5,3000,1000000],
[4.25,4.00,2.01,21.68,5,3000,1000000],
[3.50,3.25,2.05,22.19,7,1000,2000],
[3.50,3.25,2.03,21.91,7,1000,2000],
[3.50,3.25,2.01,21.63,7,1000,2000],
[4.00,4.00,2.05,22.22,7,2000,3000],
[4.00,4.00,2.03,21.94,7,2000,3000],
[4.00,4.00,2.01,21.67,7,2000,3000],
[4.75,4.75,2.05,22.18,7,3000,1000000],
[4.75,4.75,2.03,21.90,7,3000,1000000],
[4.75,4.75,2.01,21.63,7,3000,1000000]]),
columns=['Flat', 'Difer', 'Origin', 'Efetive', 'Prazo', 'Ticket-Inf', 'Ticket-Sup'])
I need to get a new columns on df with the values from df_s3_data 'Flat' accordingly to the rules:
1 - df['numP'] = df_s3_data['Prazo']
2 - df['percent'] = df_s3_data['Origin']
3 - df['value'] >= df_s3_data['Ticket-Inf']
4 - df['value'] < df_s3_data['Ticket-Sup']
The results would be a columns on df with [4.25, 3.50, 4.00, 4.25, 3.25]
I tried the lambda function below:
df['Flat'] = df.apply(lambda x: df_s3_data.loc[df_s3_data['Prazo'] == x['numP'] & df_s3_data['Origin'] == x['percent'] & x['value'] >= df_s3_data['Ticket-Inf'] & x['value'] < df_s3_data['Ticket-Sup'], df_s3_data['Flat']])
And a few merge tries, but didn't succeed.
It should work like an excel sumif with unique references.
Can you guys help me out?
Expected result is a data frame like:
df = pd.DataFrame(np.array([
[5, 'a3789', 6000, 2.03, 4.25]
[7, 'b3789', 1005, 2.05, 3.50],
[7, 'c3789', 2598, 2.05, 4.00],
[5, 'd3789', 5500, 2.05, 4.25],
[5, 'e3789', 1400, 2.03, 3.25]]),
columns=['numP', 'id', 'value', 'percent','Flat']
CodePudding user response:
IIUC, you can first perform a left merge per your first two conditions and then filter the resulting dataframe for the last two conditions:
df = df.astype({"numP": "float", "value": "float", "percent": "float"})
merged = df.merge(df_s3_data, left_on=["numP","percent"],right_on=["Prazo","Origin"],how="left")
output = merged[merged["value"].ge(merged["Ticket-Inf"]) & merged["value"].lt(merged["Ticket-Sup"])]
>>> output
numP id value percent ... Efetive Prazo Ticket-Inf Ticket-Sup
2 5.0 a3789 6000.0 2.03 ... 21.83 5.0 3000.0 1000000.0
3 7.0 b3789 1005.0 2.05 ... 22.19 7.0 1000.0 2000.0
7 7.0 c3789 2598.0 2.05 ... 22.22 7.0 2000.0 3000.0
11 5.0 d3789 5500.0 2.05 ... 22.11 5.0 3000.0 1000000.0
12 5.0 e3789 1400.0 2.03 ... 21.90 5.0 1000.0 2000.0
[5 rows x 11 columns]