Home > Enterprise >  Creating a function to execute it on entire Dataframe
Creating a function to execute it on entire Dataframe

Time:01-12

I have a data that includes columns with dates:

col_1                     col_2               
'may 2021 - 2023'    'nov 2020 - feb 2021'
'jan 2022 - 2023'    'sep 2021- 2023'

With below code I can create the required output, but I am looking to create a function which can take a dataframe as input and produces the expected output :

s = df['col_1'].str.split(r'\s*-\s*')
df['year_1'] = (pd
   .to_datetime(s.str[1])
   .sub(pd.to_datetime(s.str[0])))


t = df['col_2'].str.split(r'\s*-\s*')
df['year_2'] = (pd
   .to_datetime(t.str[1])
   .sub(pd.to_datetime(t.str[0])))

to prepare the below output i need to rerun the code with change in variable. as explained i need to make a function. please note that number of columns can be more so code should work fine

Expected Output

      col_1           Year_1                     col_2            Year_2       
'may 2021 - 2023'    610 days            'sep 2017-dec 2017'     91 days
'jan 2022 - 2023'    365 days             'sep 2021- 2023'       487 days

CodePudding user response:

You can use:

def compute_days(sr):
    parts = sr.str.strip("'").str.split('-', expand=True)
    start = pd.to_datetime(parts[0])
    end = pd.to_datetime(parts[1])
    return end - start

days = df.apply(compute_days).rename(columns=lambda x: f"Year_{x.split('_')[1]}")
out = pd.concat([df, days], axis=1)

Output:

                 col_1                  col_2    Year_1   Year_2
0    'may 2021 - 2023'  'nov 2020 - feb 2021'  610 days  92 days
1    'jan 2022 - 2023'       'sep 2021- 2023'  365 days 487 days
2  '03/2017 - 08/2021'          '2022 - 2023' 1614 days 365 days
3                   ''                     ''       NaT      NaT
  • Related