I have the below code for populating my dataframe (final_df) with either a '1' or a '0' based on the year and the month columns, into a new column called 'Lockdown'.
My code is not working - is there a more efficient way to write this?
import numpy
conditions = [
(final_df['Year'] == 2020) & (final_df['Month'] == 3),
(final_df['Year'] == 2020) & (final_df['Month'] == 4),
(final_df['Year'] == 2020) & (final_df['Month'] == 5),
(final_df['Year'] == 2020) & (final_df['Month'] == 6),
(final_df['Year'] == 2020) & (final_df['Month'] == 10),
(final_df['Year'] == 2020) & (final_df['Month'] == 11),
(final_df['Year'] == 2020) & (final_df['Month'] == 12),
(final_df['Year'] == 2021) & (final_df['Month'] == 1),
(final_df['Year'] == 2021) & (final_df['Month'] == 2),
(final_df['Year'] == 2021) & (final_df['Month'] == 3)
]
values = ['1']
final_df['Lockdown'] = np.select(conditions, values)
final_df.head()
Thank you
CodePudding user response:
I take it that you are trying to calculate the lockdown months during COVID? You can do it in a couple of ways:
# Closer to your original code
is_lockdown = (
(final_df["Year"].eq(2020) & final_df["Month"].between(3,12))
| (final_df["Year"].eq(2021) & final_df["Month"].between(1,3))
)
# Convert into Timestamp for easier calendrical calculation
is_lockdown = pd.to_datetime(final_df[["Year", "Month"]].assign(Day=1)).between("2020-03-01", "2021-03-01")
You can keep it as boolean or convert it to int with astype(int)
.
CodePudding user response:
Thank you everyone - I combined everyone's suggestions and this has worked!
conditions = [
(final_df['Month_Year'].between('2020-03-01', '2021-03-01'))
]
values = [1]* len(conditions)
final_df['Lockdown'] = np.select(conditions, values)
final_df.head()