Home > OS >  How to find the column number and return it as an array when there is a value in that column in pyth
How to find the column number and return it as an array when there is a value in that column in pyth

Time:01-21

Suppose you have the following data frame:

ID    Q1       Q2      Q3
0     1        0        1
1     0        0        1
2     1        1        1
3     0        1        1  

I would like to return an array with column numbers wherever there is a 1 and add it as another column like this:



ID   Array    Q1       Q2      Q3
0    [0,2]     1        0       1
1    [2]       0        0       1
2    [0,1,2]   1        1       1
3    [1,2]     0        1       1 

Thanks

CodePudding user response:

I would use numpy.where:

a, b = np.where(df.filter(like='ID')==1)
# or
a, b = np.where(df.drop(columns='ID')==1)

df['Array'] = pd.Series(b).groupby(a).agg(list).set_axis(df.index)

Output:

   ID  Q1  Q2  Q3      Array
0   0   1   0   1     [0, 2]
1   1   0   0   1        [2]
2   2   1   1   1  [0, 1, 2]
3   3   0   1   1     [1, 2]

Pure pandas variant:

df2 = df.filter(like='Q')

df['Array'] = (df2.set_axis(range(df2.shape[1]), axis=1).stack()
                  .loc[lambda s: s==1].reset_index()
                  .groupby('level_0')['level_1'].agg(list)
              )

CodePudding user response:

Another approach using np.where together with the "apply" method:

import pandas as pd
import numpy as np

data = {'ID': {0: 0, 1: 1, 2: 2, 3: 3},
 'Q1': {0: 1, 1: 0, 2: 1, 3: 0},
 'Q2': {0: 0, 1: 0, 2: 1, 3: 1},
 'Q3': {0: 1, 1: 1, 2: 1, 3: 1}}
df = pd.DataFrame(data)

#####

def arr_func(row):
    return np.where(row)[0]

df['Array'] = df.drop(columns = 'ID').apply(arr_func, axis = 1)

The result:

   ID  Q1  Q2  Q3      Array
0   0   1   0   1     [0, 2]
1   1   0   0   1        [2]
2   2   1   1   1  [0, 1, 2]
3   3   0   1   1     [1, 2]

CodePudding user response:

Here's a simple solution.

import pandas as pd

def my_func(row):
    return [item for item in row if item == 1] 

df['Array'] = df[['Q1', 'Q2', 'Q3']].apply(my_func, axis=1)
  • Related