Select columns based on row values in pandas and use the column indexes for subsetting-CodePudding

I have the following dataframe, where the first three columns have specific names which should not be changed ('col1' - 'col3') and the numbered columns ranging from 3 - 7.

data = [[0, 0.5, 0.5, 1, 0, 1, 0, 0],
        [1, 0.5, 0.5, 1, 1, 0, 1, 1],
        [2, 0.5, 0.5, 1, 1, 0, 1, 1]]
df = pd.DataFrame(data)
df = df.rename(columns = {0: 'Col1', 1:'Col2', 2: 'Col3'})

I would like to select all the numbered columns (column indexes 3-7) that contain the value 1 in the first row.

df2 = df.loc[0, df.iloc[0, 3:] == 1]

This throws me the following error: AssertionError

Afterwards I would like to use the indexes from df2 that represent the columns that fulfill the criteria of value 1 in row 1 (e.g. column 3 and 5) to be used to select those columns from the second row and check whether these also have value 1 or not.

df3 = df.loc[1, df.iloc[1, df2.index] == 1]

This throws the following error: IndexError: .iloc requires numeric indexers, got [3 5]

The final expected output should be that only column index 3 from row 2 fulfills the criteria of value 1 based on the fact that from row 1 only column index 3 and 5 had value 1.

How can I do this?

CodePudding user response：

Use:

df1 = df.iloc[:, 3:]
fin = df1.columns[(df1.iloc[0] == 1) & (df.iloc[1, 3:] == 1)]
print (fin)
Index([3], dtype='object')

Original solution:

out = df.columns[3:][df.iloc[0, 3:] == 1]
s = df.loc[1, out]

fin = s.index[s == 1]
print (fin)
Index([3], dtype='object')

CodePudding user response：

One option:

# first row of columns to test (could be a fixed list)
cols = df.loc[0,3:7]
# if not 1, then drop
df2 = df.drop(cols[cols.ne(1)].index, axis=1)

output:

   Col1  Col2  Col3  3  5
0     0   0.5   0.5  1  1
1     1   0.5   0.5  1  0
2     2   0.5   0.5  1  0

alternative

Just getting the name of columns containing 1:

cols = df.loc[0,3:7] # first row, columns 3 to 7
# or with iloc
# cols = df.iloc[0,3:]

cols[cols.eq(1)].index
# Index([3, 5], dtype='object')