I have two dataframes, df_rates and df_profit as shown below. df_rates has a time-date value its column name with values as certain rates and the index values denotes the minutes before the time-date value of the column. (i.e. row 1 denotes 0 mins before 2012-03-31 23:45:00, row 2 denotes 5 mins before 2012-03-31 23:45:00 and so on). Whereas df_profit has timestamps as its index and there is a Profit column.
I want to add the Profit column from df_profit as a new column to df_rates under the following condition:
1. If the timestamp (2012-03-31 23:45:00) at index of df_profit matches with the column name of df_rates (2012-03-31 23:45:00), then populate the new column with the corresponding profit value (354.664).
2. The next value (125.76) in line from Profit column of df_profit, where the timestamp is ‘2012-03-31 23:30:00’ should be populated in the new column against row where ‘Mins before time’ is 15 and so on. (Since the difference in time between 2012-03-31 23:45:00 and 2012-03-31 23:30:00 is 15 mins)
I have been trying a lot on how to do it but I am at a loss. Could someone please help or guide with this?
df_rates
Mins before time 2012-03-31 23:45:00
0 113.1
5 112.1
10 113.1
15 113.17
20 103.17
25 133.17
30 101.39
df_profit
Profit
2012-04-01 00:30:00 251.71
2012-04-01 00:15:00 652.782
2012-04-01 00:00:00 458.099
2012-03-31 23:45:00 3504.664
2012-03-31 23:30:00 1215.76
2012-03-31 23:15:00 -21.48
2012-03-31 23:00:00 -8.538
Expected dataframe:
Mins before time 2022-01-31 23:45:00 01:00 New_column
0 113.1 3504.664
5 112.1
10 113.1
15 113.17 1215.76
20 103.17
25 133.17
30 101.39 -21.48
CodePudding user response:
As an alternative to the original answer (see below), here is a very straightforward way to do what OP asks:
df_rates['Profit'] = df_profit.Profit.reindex(df_rates.columns[-1] - pd.to_timedelta(df_rates['Mins before time'], unit='min')).to_numpy()
Output:
Mins before time 2012-03-31 23:45:00 Profit
0 0 113.10 3504.664
1 5 112.10 NaN
2 10 113.10 NaN
3 15 113.17 1215.760
4 20 103.17 NaN
5 25 133.17 NaN
6 30 101.39 -21.480
Explanation:
- create a Series of Timestamps by subtracting from the Timestamp label of the final column in df_rates (assumed to be of Timestamp type, otherwise wrap it in
pd.to_datetime()
) a Series of timedelta values created using theMins before time
column with units of minutes - use
reindex
on df_profit to create a Series of profit figures that parallels the columns of df_rates and contains theProfit
values from df_profit where the Timestamp labels in the df_profit index exactly match those in the Series of Timestamps created in the previous step, and NaN elsewhere - convert that Series of profit figures (possibly including NaNs) to a numpy array with
to_numpy
and use it to initialize a new column labelsProfit
(or whatever other label is desired) in df_rates.
Original answer:
Here's a way to do it:
df_rates = ( df_rates
.set_index(df_rates.columns[-1] pd.to_timedelta(-df_rates['Mins before time'], unit='min'))
.join(df_profit).reset_index(drop=True) )
print(df_rates)
Output:
Mins before time 2012-03-31 23:45:00 Profit
0 0 113.10 3504.664
1 5 112.10 NaN
2 10 113.10 NaN
3 15 113.17 1215.760
4 20 103.17 NaN
5 25 133.17 NaN
6 30 101.39 -21.480
Explanation
- set the index of df_rates to be the label of its final column (assumed to be of Timestamp type, otherwise wrap it in
pd.to_datetime()
) plus a Series of timedelta values equal to theMins before time
column with units of minutes - use
join
with df_profit to add aProfit
column with non-null values in rows with matching index values, and reset the index.
CodePudding user response:
You could define a function to get the profits at corresponding tiemstamps:
# import dateutil.parser
# from datetime import timedelta
def getProfit_atDelay(delayMins, startTime, profitDF):
try:
startTime = dateutil.parser.parse(str(startTime))
dTime = startTime - timedelta(minutes=int(str(delayMins)))
return profitDF['Profit'].loc[dTime]
except: return None
(I'm assuming that df_profit
index contains datetime datatype; otherwise, edit code to convert dTime
to the matching datatype.)
And to get all the values for the new column:
def profits_for_rates(ratesDF, profitDF):
rStart, rMins = ratesDF.columns[1], ratesDF['Mins before time']
return [getProfit_atDelay(t, rStart, profitDF) for t in rMins]
If you use it like
df_rates['New_column'] = profits_for_rates(df_rates, df_profit)
then df_rates
will look like
Mins before time 2012-03-31 23:45:00 New_column
0 0 113.10 3504.664
1 5 112.10 NaN
2 10 113.10 NaN
3 15 113.17 1215.760
4 20 103.17 NaN
5 25 133.17 NaN
6 30 101.39 -21.480