Home > OS >  python pandas: using pd.IndexSlice for both rows and columns in a double multiindex dataframe
python pandas: using pd.IndexSlice for both rows and columns in a double multiindex dataframe

Time:12-16

I have a double Multiindex dataframe as follows. I slice the rows with idx = pd.IndexSlice but I dont know how to do the same with the columns so provided this data:

df = pd.DataFrame(data=pd.DataFrame(data=np.random.randint(0, 10, size=(9, 5))))
# rows
list1 = ['2021-01-01','2022-02-01','2022-03-01']
list2 = ['PHOTO', 'QUE','TXR']
combinations = [(x, y) for x in list1 for y in list2]     
df.index = pd.MultiIndex.from_tuples(combinations, names = ["DATE","DB"])
df.index.set_names(["DATE","DB"], inplace=True)
#columns
list1c = [('AB30','ACTIVE','A2'),('CD55','ACTIVE','A1'),('ZT52','UNACTIVE','A2'),('MIKE','PENSIONER','A2'),('ZZ00001','ACTIVE','A1')]
df.columns = pd.MultiIndex.from_tuples(list1c, names = ["UserID","KIND","DEPARTMENT"])

I slice the rows as follows:

# filtering in rows
idx = pd.IndexSlice

###### ROWS #######
# slicing dates
date_start = '2021-01-01'
date_end   = '2021-02-01'
# slicing databases
databases = ['PHOTO','QUE']

# creating the index sclice for rows
i_s = idx[date_start:date_end, databases]

###### COLUMNS ######
# ??? here mask for the columns i_c = ???

df.loc[i_s, ] 

My goal is to use the same method to slice the columns So how I generate the IndexSlice for columns that give me for example:

pseudocode: KIND= ACTIVE DEPARTMENT = A2

I would like to use the same approach, for each multilevel defining a mask

CodePudding user response:

With this code you should be able to Select columns with KIND "ACTIVE" and DEPARTMENT "A2"

columns = (slice(None), 'ACTIVE', 'A2')
df_sliced = df.loc[:, columns]

You can also use the .xs method to slice the columns by their level names.

For example:

df_sliced = df.xs('ACTIVE', level='KIND', axis=1)

To slice the DataFrame to include only the columns with KIND "ACTIVE".

CodePudding user response:

You can use:

i_c = idx[: , 'ACTIVE', 'A2']

df.loc[i_s, i_c] 

Output:

UserID             AB30
KIND             ACTIVE
DEPARTMENT           A2
DATE       DB          
2021-01-01 PHOTO      5
           QUE        9

ACTIVE or PENSIONER

i_c = idx[: , ['ACTIVE', 'PENSIONER'], 'A2']

df.loc[i_s, i_c] 

Output:

UserID             AB30      MIKE
KIND             ACTIVE PENSIONER
DEPARTMENT           A2        A2
DATE       DB                    
2021-01-01 PHOTO      5         3
           QUE        9         2
  • Related