Home > Back-end >  Apply function for specific months and days in pandas data frame
Apply function for specific months and days in pandas data frame

Time:09-22

Suppose I have daily data from 2010 until 2020:

Ex:

Date            col1

2010-01-01      False
2010-01-02      False
...

2020-12-31      False

I want to set col1 = True for all rows, where (month is equal 4 and day is bigger than 25) and month is equal to 5 and day is less then 5. So basically it means, that for all dates between 25th day of the 4th month and 5th day of the 5th month I want to set col1 = True.

How can I do it?

CodePudding user response:

You can just use .dt.month and .dt.day to access month and day from the date and then create the conditions from that:

df.loc[
  ((df.Date.dt.month == 4) & (df.Date.dt.day > 25)) | 
  ((df.Date.dt.month == 5) & (df.Date.dt.day < 5)), 
'col1'] = True

Assume your Date is already datetime type, if not, you can use following to convert:

df.Date = pd.to_datetime(df.Date)

CodePudding user response:

You can create a boolean index and assign it to a new column in your dataframe.

To find rows where the value of Date is between two other values, you can use the between method. between can take an argument inclusive that can be any of "both", "neither", "left", or "right". Use this to dial in the exact interval you want.

To ignore the year and only compare on month and day, if the column is of the dtype datetime64, you can use the dt accessor and use strftime to get just the month and day as a string and compare using lexicographic ordering:

col1 = df['Date'].dt.strftime('%m-%d').between(
    '04-25', 
    '05-05', 
    inclusive="neither"
)

To add the boolean index to your dataframe:

df['col1'] = col1

To only set the values in 'col1' for the matching rows, per @tdy in the comments:

df.loc[col1, 'col1'] = True
  • Related