Home > OS >  How to get new column in dataframe that is based on multiple conditions between two dataframes?
How to get new column in dataframe that is based on multiple conditions between two dataframes?

Time:12-13

I have two dataframes and I am looking to get a column in DF1 that will have values of the "current date" column plus number of days relating the the relevant status and technology in DF2. For example in the below the first value in the "new date" column is 18/03/2022 1095 days as it is checking to see if technology = wind and status = construction.

DF 1

Current Date Technology Status New Date DESIRED FROM CODE
18/03/2022 Wind Construction 16/12/2022
15/02/2022 Solar Construction 15/11/2022
24/01/2022 Battery Application approved 24/10/2022
23/09/2020 Wind Application approved 24/03/2023
18/11/2021 Solar Application submitted 18/11/2023
25/06/2020 Solar Application approved 25/03/2021
27/02/2020 Wind Application submitted 25/02/2025
10/03/2022 Battery Application submitted 09/03/2024

DF 2

Technology Application submitted Application approved Construction
Battery 730 273.75 273.75
Solar Photovoltaics 730 273.75 273.75
Wind 1825 912.5 1095

CodePudding user response:

Use DataFrame.melt with convert values to timedeltas by to_timedelta (if need better accuracy remove .astype(int)):

df2 = (df2.melt('Technology', var_name='Status', value_name='New Date')
          .assign(**{'New Date': 
                    lambda x: pd.to_timedelta(x['New Date'].astype(int), unit='d')}))
print (df2)
            Technology                 Status  New Date
0              Battery  Application submitted  730 days
1  Solar Photovoltaics  Application submitted  730 days
2                 Wind  Application submitted 1825 days
3              Battery   Application approved  273 days
4  Solar Photovoltaics   Application approved  273 days
5                 Wind   Application approved  912 days
6              Battery           Construction  273 days
7  Solar Photovoltaics           Construction  273 days
8                 Wind           Construction 1095 days

And then use left join and add column Current Date:

df = df1.merge(df2, on=['Technology','Status'], how='left')
df['New Date']  = pd.to_datetime(df['Current Date'], dayfirst=True)
print (df)
  Current Date Technology                 Status   New Date
0   18/03/2022       Wind           Construction 2025-03-17
1   15/02/2022      Solar           Construction        NaT
2   24/01/2022    Battery   Application approved 2022-10-24
3   23/09/2020       Wind   Application approved 2023-03-24
4   18/11/2021      Solar  Application submitted        NaT
5   25/06/2020      Solar   Application approved        NaT
6   27/02/2020       Wind  Application submitted 2025-02-25
7   10/03/2022    Battery  Application submitted 2024-03-09

For match Solar Photovoltaics values is possible use split and select first values:

df2['Technology'] = df2['Technology'].str.split().str[0]

df2 = (df2.melt('Technology', var_name='Status', value_name='New Date')
          .assign(**{'New Date':  
                  lambda x: pd.to_timedelta(x['New Date'].astype(int), unit='d')}))
print (df2)
  Technology                 Status  New Date
0    Battery  Application submitted  730 days
1      Solar  Application submitted  730 days
2       Wind  Application submitted 1825 days
3    Battery   Application approved  273 days
4      Solar   Application approved  273 days
5       Wind   Application approved  912 days
6    Battery           Construction  273 days
7      Solar           Construction  273 days
8       Wind           Construction 1095 days


df = df1.merge(df2, on=['Technology','Status'], how='left')
df['New Date']  = pd.to_datetime(df['Current Date'], dayfirst=True)
print (df)
  Current Date Technology                 Status   New Date
0   18/03/2022       Wind           Construction 2025-03-17
1   15/02/2022      Solar           Construction 2022-11-15
2   24/01/2022    Battery   Application approved 2022-10-24
3   23/09/2020       Wind   Application approved 2023-03-24
4   18/11/2021      Solar  Application submitted 2023-11-18
5   25/06/2020      Solar   Application approved 2021-03-25
6   27/02/2020       Wind  Application submitted 2025-02-25
7   10/03/2022    Battery  Application submitted 2024-03-09
  • Related