I have a data frame in CSV containing 5 columns. I want to create a new column based on the conditions in the rows. Like my df is:
col1 col2 col3 col4
1 1 1 1
0 0 1 1
1 1 1 1
nan nan nan nan
Here is my code sample
m1 = df[['col1','col2','col3','col4']].all(axis=1)
m2 = df[['col1','col2','col3','col4']].isna().any(axis=1)
df['STATUS AUTO'] = np.select([m2, m1], ['ZD', 'FIC'],'PARTIALLY IMMUNIZED')
It does not give me "PARTIALLY IMMUNIZED"
although there are many. Like in the above sample row1 is FIC, row2 & row3 are "PARTIALLY IMMUNIZED"
while row4 is "ZD"
. It gives me "ZD"
for "PARTIALLY IMMUNIZED"
.
Any help, please.
PS: (The same code works for another DF a few months back but not for this DF)
CodePudding user response:
Seems problem with strings instead numbers:
cols = ['col1','col2','col3','col4']
df[cols] = df[cols].astype(float)
m1 = df[cols].eq(1).all(axis=1)
m2 = df[cols].isna().any(axis=1)
df['STATUS AUTO'] = np.select([m2, m1], ['ZD', 'FIC'],'PARTIALLY IMMUNIZED')
print (df)
col1 col2 col3 col4 STATUS AUTO
0 1.0 1.0 1.0 1.0 FIC
1 0.0 0.0 1.0 1.0 PARTIALLY IMMUNIZED
2 1.0 1.0 1.0 1.0 FIC
3 NaN NaN NaN NaN ZD