I have columns in the pandas dataframe df_profit:
profit_date profit
0 01.04 70
1 02.04 80
2 03.04 80
3 04.04 100
4 05.04 120
5 06.04 120
6 07.04 120
7 08.04 130
8 09.04 140
9 10.04 140
And I have the second dataframe df_deals:
deals_date
0 03.04
1 05.04
2 06.04
I want to create a new column 'delta' in the df_profit and let it be equal to delta between current value and previous value in 'profit' column. But I want the delta to be calculated only after the first date in the 'profit_date' is equal to the date in the column 'deal_date' of df_deals dataframe and previous value in the delta calculation to be always the same and equal to the value when the first date in 'profit_date' was equal to the first date in 'deals_date'.
So, the result would look like:
profit_date profit delta
0 01.04 70
1 02.04 80
2 03.04 80
3 04.04 100 20
4 05.04 120 40
5 06.04 120 40
6 07.04 120 40
7 08.04 130 50
8 09.04 140 60
9 10.04 140 60
CodePudding user response:
For the next time you should provide better data to make it easier to help (dataframe creation so that we can copy paste your code).
I think this codes does what you want:
import pandas as pd
df_profit = pd.DataFrame(columns=["profit_date", "profit"],
data=[
["01.04", 70],
["02.04", 80],
["03.04", 80],
["04.04", 100],
["05.04", 120],
["06.04", 120],
["07.04", 120],
["08.04", 130],
["09.04", 140],
["10.04", 140]])
df_deals = pd.DataFrame(columns=["deals_date"], data=["03.04", "05.04", "06.04"])
# combine both dataframes, based on date columns
df = df_profit.merge(right=df_deals, left_on="profit_date", right_on="deals_date", how="left")
# find the first value (first row with deals date) and set it to 'base'
df["base"] = df.loc[df["deals_date"].first_valid_index()]["profit"]
# calculate delta
df["delta"] = df["profit"] - df["base"]
# Remove unused values
df.loc[:df["deals_date"].first_valid_index(), "delta"] = None
# remove temporary cols
df.drop(columns=["base", "deals_date"], inplace=True)
print(df)
output is:
profit_date profit delta
0 01.04 70 NaN
1 02.04 80 NaN
2 03.04 80 NaN
3 04.04 100 20.0
4 05.04 120 40.0
5 06.04 120 40.0
6 07.04 120 40.0
7 08.04 130 50.0
8 09.04 140 60.0
9 10.04 140 60.0
CodePudding user response:
You can try this one for don't get NaN values
start_profit = df_profit.loc[(df_profit["profit_date"] == df_deals.iloc[0][0])]
start_profit = start_profit.iloc[0][1]
for i in range(len(df_profit)):
if int(str(df_profit.iloc[i][0]).split(".")[0]) > 3 and int(str(df_profit.iloc[i][0]).split(".")[1]) >= 4:
df_profit.loc[i,"delta"] = df_profit.iloc[i][1]-start_profit
Hope it helps