I try to apply an IF condition in Pandas DataFrame.
DataFrame which appears as following:
a = {'Col1': [0,1,1,0],
'Col2': [0,1,0,1]}
df = pd.DataFrame(data = a)
Desired result is the following:
| col1 | col2 | result
--------------------------------------------------
| 0 | 0 | other
| 1 | 1 | col1 col2
| 1 | 0 | col1
| 0 | 1 | col2
By using np.where(), I am blocked how to get columns name suppose I have multiple columns
CodePudding user response:
In [127]: df.dot(df.columns " ").str[:-3].replace("", "other")
Out[127]:
0 other
1 Col1 Col2
2 Col1
3 Col2
dtype: object
- dot product the values of the frame with the columns but " " added as the joiner
- since values are 1/0, dot product is like a selector
- strip off the trailing " "s with [:-3]
- if empty at this point, means came from 0-0; so replace with "other"
CodePudding user response:
I would go for a custom function for this use case converting the dataframe to Boolean, then using the row values as masking for indices:
def func_(row):
cols = row.index[row]
if not cols.empty:
return " ".join(cols)
else:
return "other"
df['result']=df.astype(bool).apply(func_, axis=1)
# df
Col1 Col2 result
0 0 0 other
1 1 1 Col1 Col2
2 1 0 Col1
3 0 1 Col2
CodePudding user response:
(
df.join(
df.melt(ignore_index=False, var_name='result')
.query('value==1')
.groupby(level=0)['result']
.apply(lambda x: ' '.join(x)))
.fillna('other')
)
CodePudding user response:
Here's the np.where() method for multiple value conditions:
import pandas as pd
import numpy as np
a = {'Col1': [0,1,1,0],
'Col2': [0,1,0,1]}
df = pd.DataFrame(data = a)
df["result"]=np.where((df["Col1"]==0) & (df["Col2"]==0), "other",
np.where((df["Col1"]==1) & (df["Col2"]==1), "col1 col2",
np.where(df["Col1"]==1, "col1", "col2")))
print(df)
Result:
Col1 Col2 result
0 0 0 other
1 1 1 col1 col2
2 1 0 col1
3 0 1 col2
Using nested np.where(), however, can make your code quite ugly and convoluted...