Suppose you have the following data frame:
ID Q1 Q2 Q3
0 1 0 1
1 0 0 1
2 1 1 1
3 0 1 1
I would like to return an array with column numbers wherever there is a 1 and add it as another column like this:
ID Array Q1 Q2 Q3
0 [0,2] 1 0 1
1 [2] 0 0 1
2 [0,1,2] 1 1 1
3 [1,2] 0 1 1
Thanks
CodePudding user response:
I would use numpy.where
:
a, b = np.where(df.filter(like='ID')==1)
# or
a, b = np.where(df.drop(columns='ID')==1)
df['Array'] = pd.Series(b).groupby(a).agg(list).set_axis(df.index)
Output:
ID Q1 Q2 Q3 Array
0 0 1 0 1 [0, 2]
1 1 0 0 1 [2]
2 2 1 1 1 [0, 1, 2]
3 3 0 1 1 [1, 2]
Pure pandas variant:
df2 = df.filter(like='Q')
df['Array'] = (df2.set_axis(range(df2.shape[1]), axis=1).stack()
.loc[lambda s: s==1].reset_index()
.groupby('level_0')['level_1'].agg(list)
)
CodePudding user response:
Another approach using np.where together with the "apply" method:
import pandas as pd
import numpy as np
data = {'ID': {0: 0, 1: 1, 2: 2, 3: 3},
'Q1': {0: 1, 1: 0, 2: 1, 3: 0},
'Q2': {0: 0, 1: 0, 2: 1, 3: 1},
'Q3': {0: 1, 1: 1, 2: 1, 3: 1}}
df = pd.DataFrame(data)
#####
def arr_func(row):
return np.where(row)[0]
df['Array'] = df.drop(columns = 'ID').apply(arr_func, axis = 1)
The result:
ID Q1 Q2 Q3 Array
0 0 1 0 1 [0, 2]
1 1 0 0 1 [2]
2 2 1 1 1 [0, 1, 2]
3 3 0 1 1 [1, 2]
CodePudding user response:
Here's a simple solution.
import pandas as pd
def my_func(row):
return [item for item in row if item == 1]
df['Array'] = df[['Q1', 'Q2', 'Q3']].apply(my_func, axis=1)