Home > Back-end >  How to get max day from a pandas dataframe?
How to get max day from a pandas dataframe?

Time:09-17

I have the following pandas dataframe:

       import pandas as pd

       df = pd.DataFrame({'ID': [1,2,3,4],
                          'month': [12, 2, 8, 6],
                          'year': [2021, 2022, 2022, 2020]})

       print(df)

       ID   month   year
        1   12      2021
        2   2       2022
        3   8       2022
        4   6       2020

I need to add a new column in the dataframe. This column must contain the maximum day of that month and year of the dataframe row. Searching the internet, I found the use of "days_in_month" and made an implementation as follows:

      df['Max_day'] = 0

      for i in range(0,len(df)):
          v_month = df['month'].loc[i]
          v_year = df['year'].loc[i]

          df['Max_day'].loc[i] = pd.Period(v_year.astype(str)   "-"   
                                           v_month.astype(str)).days_in_month

This implementation works perfectly, the output is as desired:

        ID  month   year    Max_day
        1   12      2021    31
        2   2       2022    28
        3   8       2022    31
        4   6       2020    30

However, this code takes a long time to run. Does anyone know a way that can optimize the runtime?

CodePudding user response:

Use to_datetime with 3 columns year, month, day for datetimes and then Series.dt.days_in_month:

df['Max_day'] = pd.to_datetime(df[['year','month']].assign(day=1)).dt.days_in_month
print(df)
   ID  month  year  Max_day
0   1     12  2021       31
1   2      2  2022       28
2   3      8  2022       31
3   4      6  2020       30
  • Related