Home > Software engineering >  How to populate a new column in a dataframe based on multiple conditions?
How to populate a new column in a dataframe based on multiple conditions?

Time:10-22

I have the below code for populating my dataframe (final_df) with either a '1' or a '0' based on the year and the month columns, into a new column called 'Lockdown'.

My code is not working - is there a more efficient way to write this?


import numpy

conditions = [
    (final_df['Year'] == 2020) & (final_df['Month'] == 3),
    (final_df['Year'] == 2020) & (final_df['Month'] == 4),
    (final_df['Year'] == 2020) & (final_df['Month'] == 5),
    (final_df['Year'] == 2020) & (final_df['Month'] == 6),
    (final_df['Year'] == 2020) & (final_df['Month'] == 10),
    (final_df['Year'] == 2020) & (final_df['Month'] == 11),
    (final_df['Year'] == 2020) & (final_df['Month'] == 12),
    (final_df['Year'] == 2021) & (final_df['Month'] == 1),
    (final_df['Year'] == 2021) & (final_df['Month'] == 2),
    (final_df['Year'] == 2021) & (final_df['Month'] == 3)
    ]

values = ['1']

final_df['Lockdown'] = np.select(conditions, values)

final_df.head()

Thank you

CodePudding user response:

I take it that you are trying to calculate the lockdown months during COVID? You can do it in a couple of ways:

# Closer to your original code
is_lockdown = (
    (final_df["Year"].eq(2020) & final_df["Month"].between(3,12))
    | (final_df["Year"].eq(2021) & final_df["Month"].between(1,3))
)

# Convert into Timestamp for easier calendrical calculation
is_lockdown = pd.to_datetime(final_df[["Year", "Month"]].assign(Day=1)).between("2020-03-01", "2021-03-01")

You can keep it as boolean or convert it to int with astype(int).

CodePudding user response:

Thank you everyone - I combined everyone's suggestions and this has worked!

conditions = [
    (final_df['Month_Year'].between('2020-03-01', '2021-03-01'))
    ]

values = [1]* len(conditions)

final_df['Lockdown'] = np.select(conditions, values)
final_df.head()

  • Related