I have two dataframes and I am looking to get a column in DF1 that will have values of the "current date" column plus number of days relating the the relevant status and technology in DF2. For example in the below the first value in the "new date" column is 18/03/2022 1095 days as it is checking to see if technology = wind and status = construction.
DF 1
Current Date | Technology | Status | New Date DESIRED FROM CODE |
---|---|---|---|
18/03/2022 | Wind | Construction | 16/12/2022 |
15/02/2022 | Solar | Construction | 15/11/2022 |
24/01/2022 | Battery | Application approved | 24/10/2022 |
23/09/2020 | Wind | Application approved | 24/03/2023 |
18/11/2021 | Solar | Application submitted | 18/11/2023 |
25/06/2020 | Solar | Application approved | 25/03/2021 |
27/02/2020 | Wind | Application submitted | 25/02/2025 |
10/03/2022 | Battery | Application submitted | 09/03/2024 |
DF 2
Technology | Application submitted | Application approved | Construction |
---|---|---|---|
Battery | 730 | 273.75 | 273.75 |
Solar Photovoltaics | 730 | 273.75 | 273.75 |
Wind | 1825 | 912.5 | 1095 |
CodePudding user response:
Use DataFrame.melt
with convert values to timedeltas by to_timedelta
(if need better accuracy remove .astype(int)
):
df2 = (df2.melt('Technology', var_name='Status', value_name='New Date')
.assign(**{'New Date':
lambda x: pd.to_timedelta(x['New Date'].astype(int), unit='d')}))
print (df2)
Technology Status New Date
0 Battery Application submitted 730 days
1 Solar Photovoltaics Application submitted 730 days
2 Wind Application submitted 1825 days
3 Battery Application approved 273 days
4 Solar Photovoltaics Application approved 273 days
5 Wind Application approved 912 days
6 Battery Construction 273 days
7 Solar Photovoltaics Construction 273 days
8 Wind Construction 1095 days
And then use left join and add column Current Date
:
df = df1.merge(df2, on=['Technology','Status'], how='left')
df['New Date'] = pd.to_datetime(df['Current Date'], dayfirst=True)
print (df)
Current Date Technology Status New Date
0 18/03/2022 Wind Construction 2025-03-17
1 15/02/2022 Solar Construction NaT
2 24/01/2022 Battery Application approved 2022-10-24
3 23/09/2020 Wind Application approved 2023-03-24
4 18/11/2021 Solar Application submitted NaT
5 25/06/2020 Solar Application approved NaT
6 27/02/2020 Wind Application submitted 2025-02-25
7 10/03/2022 Battery Application submitted 2024-03-09
For match Solar Photovoltaics
values is possible use split and select first values:
df2['Technology'] = df2['Technology'].str.split().str[0]
df2 = (df2.melt('Technology', var_name='Status', value_name='New Date')
.assign(**{'New Date':
lambda x: pd.to_timedelta(x['New Date'].astype(int), unit='d')}))
print (df2)
Technology Status New Date
0 Battery Application submitted 730 days
1 Solar Application submitted 730 days
2 Wind Application submitted 1825 days
3 Battery Application approved 273 days
4 Solar Application approved 273 days
5 Wind Application approved 912 days
6 Battery Construction 273 days
7 Solar Construction 273 days
8 Wind Construction 1095 days
df = df1.merge(df2, on=['Technology','Status'], how='left')
df['New Date'] = pd.to_datetime(df['Current Date'], dayfirst=True)
print (df)
Current Date Technology Status New Date
0 18/03/2022 Wind Construction 2025-03-17
1 15/02/2022 Solar Construction 2022-11-15
2 24/01/2022 Battery Application approved 2022-10-24
3 23/09/2020 Wind Application approved 2023-03-24
4 18/11/2021 Solar Application submitted 2023-11-18
5 25/06/2020 Solar Application approved 2021-03-25
6 27/02/2020 Wind Application submitted 2025-02-25
7 10/03/2022 Battery Application submitted 2024-03-09