calculate new column values based on conditions in pandas-CodePudding

I have columns in the pandas dataframe df_profit:

    profit_date   profit
0   01.04         70    
1   02.04         80    
2   03.04         80    
3   04.04         100   
4   05.04         120   
5   06.04         120   
6   07.04         120   
7   08.04         130   
8   09.04         140   
9   10.04         140

And I have the second dataframe df_deals:

    deals_date
0   03.04
1   05.04
2   06.04

I want to create a new column 'delta' in the df_profit and let it be equal to delta between current value and previous value in 'profit' column. But I want the delta to be calculated only after the first date in the 'profit_date' is equal to the date in the column 'deal_date' of df_deals dataframe and previous value in the delta calculation to be always the same and equal to the value when the first date in 'profit_date' was equal to the first date in 'deals_date'.

So, the result would look like:

    profit_date   profit   delta
0   01.04         70       
1   02.04         80       
2   03.04         80       
3   04.04         100      20
4   05.04         120      40
5   06.04         120      40
6   07.04         120      40
7   08.04         130      50
8   09.04         140      60
9   10.04         140      60

CodePudding user response：

For the next time you should provide better data to make it easier to help (dataframe creation so that we can copy paste your code).

I think this codes does what you want:

import pandas as pd

df_profit = pd.DataFrame(columns=["profit_date", "profit"],
                         data=[
                             ["01.04", 70],
                             ["02.04", 80],
                             ["03.04", 80],
                             ["04.04", 100],
                             ["05.04", 120],
                             ["06.04", 120],
                             ["07.04", 120],
                             ["08.04", 130],
                             ["09.04", 140],
                             ["10.04", 140]])

df_deals = pd.DataFrame(columns=["deals_date"], data=["03.04", "05.04", "06.04"])

# combine both dataframes, based on date columns
df = df_profit.merge(right=df_deals, left_on="profit_date", right_on="deals_date", how="left")

# find the first value (first row with deals date) and set it to 'base'
df["base"] = df.loc[df["deals_date"].first_valid_index()]["profit"]

# calculate delta
df["delta"] = df["profit"] - df["base"]

# Remove unused values
df.loc[:df["deals_date"].first_valid_index(), "delta"] = None

# remove temporary cols
df.drop(columns=["base", "deals_date"], inplace=True)

print(df)

output is:

  profit_date  profit  delta
0       01.04      70    NaN
1       02.04      80    NaN
2       03.04      80    NaN
3       04.04     100   20.0
4       05.04     120   40.0
5       06.04     120   40.0
6       07.04     120   40.0
7       08.04     130   50.0
8       09.04     140   60.0
9       10.04     140   60.0

CodePudding user response：

You can try this one for don't get NaN values

start_profit = df_profit.loc[(df_profit["profit_date"] == df_deals.iloc[0][0])]
start_profit = start_profit.iloc[0][1]
for i in range(len(df_profit)):
   if int(str(df_profit.iloc[i][0]).split(".")[0]) > 3 and int(str(df_profit.iloc[i][0]).split(".")[1]) >= 4:
       df_profit.loc[i,"delta"] = df_profit.iloc[i][1]-start_profit

Hope it helps