I have a data frame with many columns and three rows. I want to filter for the columns based on conditions for different rows
e.g. In the following example I would like to get all the columns for which row 'AAA' has a value of < -1 and for which rows 'BBBB' and 'CCCC' have a value of > -1
import pandas as pd
data = {"Name": ["AAAA", "BBBB", "CCCC"],
"C1": [-2, -0.5, -0.5],
"C2": [-2, -0.5, -0.5],
"C3": [-0.5, -2, -2]}
df = pd.DataFrame(data)
df.set_index("Name")
C1 C2 C3
Name
AAAA -2.0 -2.0 -0.5
BBBB -0.5 -0.5 -2.0
CCCC -0.5 -0.5 -2.0
I think I need to use loc
but I don't know how in this case.
My output would ideally be:
C1 C2
Name
AAAA -2.0 -2.0
BBBB -0.5 -0.5
CCCC -0.5 -0.5
CodePudding user response:
Since you say you have many columns and few rows, it might be easier to transpose your df and work "normally" from there on.
Consider:
dft = df.transpose()
print(dft)
#Name AAAA BBBB CCCC
#C1 -2.0 -0.5 -0.5
#C2 -2.0 -0.5 -0.5
#C3 -0.5 -2.0 -2.0
dft[(dft.AAAA < -1) & (dft.CCCC > -1)]
#Name AAAA BBBB CCCC
#C1 -2.0 -0.5 -0.5
#C2 -2.0 -0.5 -0.5
CodePudding user response:
Normally this sort of operation is performed on a row basis, so if your dataset allows it I would transpose the rows/columns.
Additionally, to set the 'Name' column to the index you need to use the 'inplace' option or set df = df.set_index("Name").
Here is one way to get the result you are after, I have broken it down to each logical step so that you could scale up with as many criteria as required.
df = df.set_index("Name")
# create a mask of columns based on criteria
mask1 = df.loc['AAAA'] < -1
mask2 = df.loc['BBBB'] > -1
mask3 = df.loc['CCCC'] > -1
# combine to single mask
mask = mask1*mask2*mask3
# set dataframe to only required columns
df_out = df.loc[:, mask]
# alternative one liner but less clear
df_out2 = df.loc[:, (df.loc['AAAA'] < -1) &
(df.loc['BBBB'] > -1) &
(df.loc['CCCC'] > -1)]
CodePudding user response:
You need to set the value for df. There is two solutions:
1/inplace
df.set_index("Name", inplace=True)
df[["C1", "C2"]]
2/set df value
df = df.set_index("Name")
df[["C1", "C2"]]