I have the following dataframe, where the first three columns have specific names which should not be changed ('col1' - 'col3') and the numbered columns ranging from 3 - 7.
data = [[0, 0.5, 0.5, 1, 0, 1, 0, 0],
[1, 0.5, 0.5, 1, 1, 0, 1, 1],
[2, 0.5, 0.5, 1, 1, 0, 1, 1]]
df = pd.DataFrame(data)
df = df.rename(columns = {0: 'Col1', 1:'Col2', 2: 'Col3'})
I would like to select all the numbered columns (column indexes 3-7) that contain the value 1 in the first row.
df2 = df.loc[0, df.iloc[0, 3:] == 1]
This throws me the following error: AssertionError
Afterwards I would like to use the indexes from df2 that represent the columns that fulfill the criteria of value 1 in row 1 (e.g. column 3 and 5) to be used to select those columns from the second row and check whether these also have value 1 or not.
df3 = df.loc[1, df.iloc[1, df2.index] == 1]
This throws the following error: IndexError: .iloc requires numeric indexers, got [3 5]
The final expected output should be that only column index 3 from row 2 fulfills the criteria of value 1 based on the fact that from row 1 only column index 3 and 5 had value 1.
How can I do this?
CodePudding user response:
Use:
df1 = df.iloc[:, 3:]
fin = df1.columns[(df1.iloc[0] == 1) & (df.iloc[1, 3:] == 1)]
print (fin)
Index([3], dtype='object')
Original solution:
out = df.columns[3:][df.iloc[0, 3:] == 1]
s = df.loc[1, out]
fin = s.index[s == 1]
print (fin)
Index([3], dtype='object')
CodePudding user response:
One option:
# first row of columns to test (could be a fixed list)
cols = df.loc[0,3:7]
# if not 1, then drop
df2 = df.drop(cols[cols.ne(1)].index, axis=1)
output:
Col1 Col2 Col3 3 5
0 0 0.5 0.5 1 1
1 1 0.5 0.5 1 0
2 2 0.5 0.5 1 0
alternative
Just getting the name of columns containing 1:
cols = df.loc[0,3:7] # first row, columns 3 to 7
# or with iloc
# cols = df.iloc[0,3:]
cols[cols.eq(1)].index
# Index([3, 5], dtype='object')