I am stuck basically joining a Pandas series with a DataFrame.
Let's generate some dummy data using code below
test = pd.DataFrame({'Date': ['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04', '2021-01-05',
'2021-01-06', '2021-01-07', '2021-01-08', '2021-01-09', '2021-01-10',
'2021-01-11', '2021-01-12', '2021-01-13', '2021-01-14'],
'New_Date': [np.nan, '2021-01-01', '2021-01-01', '2021-01-04', '2021-01-03',
'2021-01-06', '2021-01-08', '2021-01-08', '2021-01-09', '2021-01-11',
'2021-01-11', '2021-01-13', '2021-01-13', np.nan],
'Price': [1, 1, 5, 3, 4, 3, 2, 5, 6, 4, 3, 2, 1, 7]})
test['Date'] = pd.to_datetime(test['Date'])
test['New_Date'] = pd.to_datetime(test['New_Date'])
test.set_index('Date', inplace=True)
Actual Df
------------ ------------ -------
| Date | New_Date | Price |
------------ ------------ -------
| 01/01/2021 | NaT | 1 |
| 02/01/2021 | 01/01/2021 | 1 |
| 03/01/2021 | 01/01/2021 | 5 |
| 04/01/2021 | 04/01/2021 | 3 |
| 05/01/2021 | 03/01/2021 | 4 |
| 06/01/2021 | 06/01/2021 | 3 |
| 07/01/2021 | 08/01/2021 | 2 |
| 08/01/2021 | 08/01/2021 | 5 |
| 09/01/2021 | 09/01/2021 | 6 |
| 10/01/2021 | 11/01/2021 | 4 |
| 11/01/2021 | 11/01/2021 | 3 |
| 12/01/2021 | 13/01/2021 | 2 |
| 13/01/2021 | 13/01/2021 | 1 |
| 14/01/2021 | NaT | 7 |
------------ ------------ -------
Desired Output
------------ ------------ ------- -----------
| Date | New_Date | Price | New_Price |
------------ ------------ ------- -----------
| 01/01/2021 | NaT | 1 | NaN |
| 02/01/2021 | 01/01/2021 | 1 | 1 |
| 03/01/2021 | 01/01/2021 | 5 | 1 |
| 04/01/2021 | 04/01/2021 | 3 | 3 |
| 05/01/2021 | 03/01/2021 | 4 | 5 |
| 06/01/2021 | 06/01/2021 | 3 | 3 |
| 07/01/2021 | 08/01/2021 | 2 | 5 |
| 08/01/2021 | 08/01/2021 | 5 | 5 |
| 09/01/2021 | 09/01/2021 | 6 | 6 |
| 10/01/2021 | 11/01/2021 | 4 | 3 |
| 11/01/2021 | 11/01/2021 | 3 | 3 |
| 12/01/2021 | 13/01/2021 | 2 | 1 |
| 13/01/2021 | 13/01/2021 | 1 | 1 |
| 14/01/2021 | NaT | 7 | NaN |
------------ ------------ ------- -----------
I want to create a column New_Price
by using New_Date
as an index and Joining with Date
to get Price
which will be named as New_Price
.
As per this link
I've tried the solution below:
test['New_Price'] = test['Price'][test['New_Date']].values
The above solution fails due to NaT
, this is not basically an actual join so I tried an another way
test.join(test.drop('New_Date', 1), on='New_Date', rsuffix='_y')
This solves the problem as I just need to rename Price_y
to New_Price
but if there are 20 columns in test df how can I keep columns from left df and only price column named as New_Price
which would come from right df. Is there any elegant way to achieve this?
CodePudding user response:
As the dataframe test
has already set Date
as an index, you can readily use it as a mapping table for looking up New Date
from index Date
. Then, based on the matching New_Date
to Date
, we can get the corresponding Price
as New_Price
(for New_Date
).
This can be achieved by using Series.map()
, as follows:
test['New_Price'] = test['New_Date'].map(test['Price'])
Result:
print(test)
New_Date Price New_Price
Date
2021-01-01 NaT 1 NaN
2021-01-02 2021-01-01 1 1.0
2021-01-03 2021-01-01 5 1.0
2021-01-04 2021-01-04 3 3.0
2021-01-05 2021-01-03 4 5.0
2021-01-06 2021-01-06 3 3.0
2021-01-07 2021-01-08 2 5.0
2021-01-08 2021-01-08 5 5.0
2021-01-09 2021-01-09 6 6.0
2021-01-10 2021-01-11 4 3.0
2021-01-11 2021-01-11 3 3.0
2021-01-12 2021-01-13 2 1.0
2021-01-13 2021-01-13 1 1.0
2021-01-14 NaT 7 NaN
CodePudding user response:
Let's try join
only on the Price
column aligned on New_Date
:
new_df = test.join(test['Price'].rename('New_Price'), on='New_Date')
new_df
New_Date Price New_Price
Date
2021-01-01 NaT 1 NaN
2021-01-02 2021-01-01 1 1.0
2021-01-03 2021-01-01 5 1.0
2021-01-04 2021-01-04 3 3.0
2021-01-05 2021-01-03 4 5.0
2021-01-06 2021-01-06 3 3.0
2021-01-07 2021-01-08 2 5.0
2021-01-08 2021-01-08 5 5.0
2021-01-09 2021-01-09 6 6.0
2021-01-10 2021-01-11 4 3.0
2021-01-11 2021-01-11 3 3.0
2021-01-12 2021-01-13 2 1.0
2021-01-13 2021-01-13 1 1.0
2021-01-14 NaT 7 NaN