Home > Enterprise >  Select columns based on row values in pandas and use the column indexes for subsetting
Select columns based on row values in pandas and use the column indexes for subsetting

Time:02-19

I have the following dataframe, where the first three columns have specific names which should not be changed ('col1' - 'col3') and the numbered columns ranging from 3 - 7.

data = [[0, 0.5, 0.5, 1, 0, 1, 0, 0],
        [1, 0.5, 0.5, 1, 1, 0, 1, 1],
        [2, 0.5, 0.5, 1, 1, 0, 1, 1]]
df = pd.DataFrame(data)
df = df.rename(columns = {0: 'Col1', 1:'Col2', 2: 'Col3'})

I would like to select all the numbered columns (column indexes 3-7) that contain the value 1 in the first row.

df2 = df.loc[0, df.iloc[0, 3:] == 1]

This throws me the following error: AssertionError

Afterwards I would like to use the indexes from df2 that represent the columns that fulfill the criteria of value 1 in row 1 (e.g. column 3 and 5) to be used to select those columns from the second row and check whether these also have value 1 or not.

df3 = df.loc[1, df.iloc[1, df2.index] == 1]

This throws the following error: IndexError: .iloc requires numeric indexers, got [3 5]

The final expected output should be that only column index 3 from row 2 fulfills the criteria of value 1 based on the fact that from row 1 only column index 3 and 5 had value 1.

How can I do this?

CodePudding user response:

Use:

df1 = df.iloc[:, 3:]
fin = df1.columns[(df1.iloc[0] == 1) & (df.iloc[1, 3:] == 1)]
print (fin)
Index([3], dtype='object')

Original solution:

out = df.columns[3:][df.iloc[0, 3:] == 1]
s = df.loc[1, out]

fin = s.index[s == 1]
print (fin)
Index([3], dtype='object')

CodePudding user response:

One option:

# first row of columns to test (could be a fixed list)
cols = df.loc[0,3:7]
# if not 1, then drop
df2 = df.drop(cols[cols.ne(1)].index, axis=1)

output:

   Col1  Col2  Col3  3  5
0     0   0.5   0.5  1  1
1     1   0.5   0.5  1  0
2     2   0.5   0.5  1  0
alternative

Just getting the name of columns containing 1:

cols = df.loc[0,3:7] # first row, columns 3 to 7
# or with iloc
# cols = df.iloc[0,3:]

cols[cols.eq(1)].index
# Index([3, 5], dtype='object')
  • Related