How do I select columns from a pandas data frame using conditions in multiple rows-CodePudding

I have a data frame with many columns and three rows. I want to filter for the columns based on conditions for different rows

e.g. In the following example I would like to get all the columns for which row 'AAA' has a value of < -1 and for which rows 'BBBB' and 'CCCC' have a value of > -1

import pandas as pd

data = {"Name": ["AAAA", "BBBB", "CCCC"],
        "C1": [-2, -0.5, -0.5],
        "C2": [-2, -0.5, -0.5],
        "C3": [-0.5, -2, -2]}

df = pd.DataFrame(data)
df.set_index("Name")

         C1      C2      C3
Name            
AAAA    -2.0    -2.0    -0.5
BBBB    -0.5    -0.5    -2.0
CCCC    -0.5    -0.5    -2.0

I think I need to use loc but I don't know how in this case.

My output would ideally be:

         C1      C2     
Name            
AAAA    -2.0    -2.0    
BBBB    -0.5    -0.5    
CCCC    -0.5    -0.5

CodePudding user response：

Since you say you have many columns and few rows, it might be easier to transpose your df and work "normally" from there on.

Consider:


dft = df.transpose()

print(dft)
#Name  AAAA  BBBB  CCCC
#C1    -2.0  -0.5  -0.5
#C2    -2.0  -0.5  -0.5
#C3    -0.5  -2.0  -2.0

dft[(dft.AAAA < -1) & (dft.CCCC > -1)]
#Name  AAAA  BBBB  CCCC
#C1    -2.0  -0.5  -0.5
#C2    -2.0  -0.5  -0.5

CodePudding user response：

Normally this sort of operation is performed on a row basis, so if your dataset allows it I would transpose the rows/columns.

Additionally, to set the 'Name' column to the index you need to use the 'inplace' option or set df = df.set_index("Name").

Here is one way to get the result you are after, I have broken it down to each logical step so that you could scale up with as many criteria as required.

df = df.set_index("Name")

# create a mask of columns based on criteria
mask1 = df.loc['AAAA'] < -1

mask2 = df.loc['BBBB'] > -1 

mask3 = df.loc['CCCC'] > -1

# combine to single mask
mask = mask1*mask2*mask3

# set dataframe to only required columns
df_out = df.loc[:, mask]

# alternative one liner but less clear

df_out2 = df.loc[:, (df.loc['AAAA'] < -1) &
                   (df.loc['BBBB'] > -1) &
                   (df.loc['CCCC'] > -1)]

CodePudding user response：

You need to set the value for df. There is two solutions:

1/inplace

df.set_index("Name", inplace=True)
df[["C1", "C2"]]

2/set df value

df = df.set_index("Name")
df[["C1", "C2"]]