Home > Enterprise >  Check if column name from one dataframe matches with index value of another dataframe and populate v
Check if column name from one dataframe matches with index value of another dataframe and populate v

Time:12-14

I have two dataframes, df_rates and df_profit as shown below. df_rates has a time-date value its column name with values as certain rates and the index values denotes the minutes before the time-date value of the column. (i.e. row 1 denotes 0 mins before 2012-03-31 23:45:00, row 2 denotes 5 mins before  2012-03-31 23:45:00 and so on).  Whereas df_profit has timestamps as its index and there is a  Profit column.

I want to add the Profit column from df_profit as a new column to df_rates under the following condition:

1.       If the timestamp (2012-03-31 23:45:00) at index of df_profit matches with the column name of df_rates (2012-03-31 23:45:00), then populate the new column with the corresponding profit value (354.664).

2.       The next value (125.76) in line from Profit column of df_profit, where the timestamp is ‘2012-03-31 23:30:00’ should be populated in the new column against row where ‘Mins before time’ is 15 and so on. (Since the difference in time between 2012-03-31 23:45:00 and 2012-03-31 23:30:00 is 15 mins)

I have been trying a lot on how to do it but I am at a loss. Could someone please help or guide with this?

df_rates
Mins before time     2012-03-31 23:45:00
0                        113.1
5                        112.1
10                       113.1
15                       113.17
20                       103.17
25                       133.17
30                       101.39


df_profit
                         Profit
2012-04-01 00:30:00      251.71
2012-04-01 00:15:00      652.782
2012-04-01 00:00:00      458.099
2012-03-31 23:45:00      3504.664
2012-03-31 23:30:00      1215.76
2012-03-31 23:15:00     -21.48
2012-03-31 23:00:00     -8.538
    

Expected dataframe:
Mins before time    2022-01-31 23:45:00 01:00   New_column
0                      113.1                     3504.664
5                      112.1    
10                     113.1    
15                     113.17                    1215.76
20                     103.17   
25                     133.17   
30                     101.39                    -21.48

CodePudding user response:

As an alternative to the original answer (see below), here is a very straightforward way to do what OP asks:

df_rates['Profit'] = df_profit.Profit.reindex(df_rates.columns[-1] - pd.to_timedelta(df_rates['Mins before time'], unit='min')).to_numpy()

Output:

   Mins before time  2012-03-31 23:45:00    Profit
0                 0               113.10  3504.664
1                 5               112.10       NaN
2                10               113.10       NaN
3                15               113.17  1215.760
4                20               103.17       NaN
5                25               133.17       NaN
6                30               101.39   -21.480

Explanation:

  • create a Series of Timestamps by subtracting from the Timestamp label of the final column in df_rates (assumed to be of Timestamp type, otherwise wrap it in pd.to_datetime()) a Series of timedelta values created using the Mins before time column with units of minutes
  • use reindex on df_profit to create a Series of profit figures that parallels the columns of df_rates and contains the Profit values from df_profit where the Timestamp labels in the df_profit index exactly match those in the Series of Timestamps created in the previous step, and NaN elsewhere
  • convert that Series of profit figures (possibly including NaNs) to a numpy array with to_numpy and use it to initialize a new column labels Profit (or whatever other label is desired) in df_rates.

Original answer:

Here's a way to do it:

df_rates = ( df_rates
    .set_index(df_rates.columns[-1]   pd.to_timedelta(-df_rates['Mins before time'], unit='min'))
    .join(df_profit).reset_index(drop=True) )
print(df_rates)

Output:

   Mins before time  2012-03-31 23:45:00    Profit
0                 0               113.10  3504.664
1                 5               112.10       NaN
2                10               113.10       NaN
3                15               113.17  1215.760
4                20               103.17       NaN
5                25               133.17       NaN
6                30               101.39   -21.480

Explanation

  • set the index of df_rates to be the label of its final column (assumed to be of Timestamp type, otherwise wrap it in pd.to_datetime()) plus a Series of timedelta values equal to the Mins before time column with units of minutes
  • use join with df_profit to add a Profit column with non-null values in rows with matching index values, and reset the index.

CodePudding user response:

You could define a function to get the profits at corresponding tiemstamps:

# import dateutil.parser
# from datetime import timedelta

def getProfit_atDelay(delayMins, startTime, profitDF):
    try: 
        startTime = dateutil.parser.parse(str(startTime))
        dTime = startTime - timedelta(minutes=int(str(delayMins)))
        return profitDF['Profit'].loc[dTime]
    except: return None

(I'm assuming that df_profit index contains datetime datatype; otherwise, edit code to convert dTime to the matching datatype.)


And to get all the values for the new column:

def profits_for_rates(ratesDF, profitDF):
    rStart, rMins = ratesDF.columns[1], ratesDF['Mins before time']
    return [getProfit_atDelay(t, rStart, profitDF) for t in rMins]

If you use it like

df_rates['New_column'] = profits_for_rates(df_rates, df_profit)

then df_rates will look like

   Mins before time  2012-03-31 23:45:00  New_column
0                 0               113.10    3504.664
1                 5               112.10         NaN
2                10               113.10         NaN
3                15               113.17    1215.760
4                20               103.17         NaN
5                25               133.17         NaN
6                30               101.39     -21.480
  • Related