I'm having trouble formulating this statement in Pandas that would be very simple in excel. I have a dataframe sample as follows:
colA colB colC
10 0 27:15 John Doe
11 0 24:33 John Doe
12 1 29:43 John Doe
13 Inactive John Doe None
14 N/A John Doe None
Obviously the dataframe is much larger than this, with 10,000 rows, so I'm trying to find an easier way to do this. I want to create a column that checks if colA is equal to 0 or 1. If so, then equals colC. If not, then equals colC. In excel, I would simply create a new column (new_col) and write
=IF(OR(A2<>0,A2<>1),B2,C2)
And then drag fill the entire sheet.
I'm sure this is fairly simple but I cannot for the life of me figure this out.
Result should look like this
colA colB colC new_col
10 0 27:15 John Doe John Doe
11 0 24:33 John Doe John Doe
12 1 29:43 John Doe John Doe
13 Inactive John Doe None John Doe
14 N/A John Doe None John Doe
CodePudding user response:
np.where
should do the trick.
df['new_col'] = np.where(df['colA'].isin([0, 1]), df['colB'], df['colC'])
CodePudding user response:
Here is a solution that adds your results to a list given your conditions, then adds the list back in the dataframe as D column.
your_results=[]
for i,data in enumerate(df["colA"]):
if data==0 or data==1:
your_results.append(df["colC"][i])
else:
your_results.append(df["colB"][i])
df["colD"]=your_results