I have the following pandas dataframe:
import pandas as pd
df = pd.DataFrame({'ID': [1,2,3,4],
'month': [12, 2, 8, 6],
'year': [2021, 2022, 2022, 2020]})
print(df)
ID month year
1 12 2021
2 2 2022
3 8 2022
4 6 2020
I need to add a new column in the dataframe. This column must contain the maximum day of that month and year of the dataframe row. Searching the internet, I found the use of "days_in_month" and made an implementation as follows:
df['Max_day'] = 0
for i in range(0,len(df)):
v_month = df['month'].loc[i]
v_year = df['year'].loc[i]
df['Max_day'].loc[i] = pd.Period(v_year.astype(str) "-"
v_month.astype(str)).days_in_month
This implementation works perfectly, the output is as desired:
ID month year Max_day
1 12 2021 31
2 2 2022 28
3 8 2022 31
4 6 2020 30
However, this code takes a long time to run. Does anyone know a way that can optimize the runtime?
CodePudding user response:
Use to_datetime
with 3 columns year, month, day
for datetimes and then Series.dt.days_in_month
:
df['Max_day'] = pd.to_datetime(df[['year','month']].assign(day=1)).dt.days_in_month
print(df)
ID month year Max_day
0 1 12 2021 31
1 2 2 2022 28
2 3 8 2022 31
3 4 6 2020 30