Home > Software design >  How to use slice to exclude rows and columns from dataframe
How to use slice to exclude rows and columns from dataframe

Time:08-13

I have a DataFrame

import pandas as pd
import numpy as np

index = pd.MultiIndex.from_product([["A", "B"], ["AA", "BB"]])
columns = pd.MultiIndex.from_product([["X", "Y"], ["XX", "YY"]])

df = pd.DataFrame([[1,2,3,4],
                   [5,6,7,8],
                   [9,10,11,12],
                   [13,14,15,16]], index = index, columns = columns)

and slice

toSkip = ((slice(None), slice(None)), (["X"], slice(None)))

I know that I can write df.loc[slice] to get the subset of DataFrame which corresponds to this slice. But how can I do the opposite so get the difference between original df and the one obtained with that slice?

CodePudding user response:

How to invert slicing

To get the idea let's make it more complicated.

import pandas as pd
import numpy as np

index = pd.MultiIndex.from_product([["A", "B", "C"], ["AA", "BB", "CC"]])
columns = pd.MultiIndex.from_product([["X", "Y", "Z"], ["XX", "YY", "ZZ"]])
data = (
    np
    .arange(len(index) * len(columns))
    .reshape(len(index), len(columns))
)

df = pd.DataFrame(data, index, columns)

Let's say I want to process all the data except the inner square (B,Y).

data

I can get the square by slicing. To get others I'm gonna use a boolean mask:

mask = pd.DataFrame(True, index, columns)
toSkip = ((['B'], slice(None)), (['Y'], slice(None)))
mask.loc[toSkip] = False

mask

Now I can transform others by windowing with mask:

# just for illustration purposes
# let's invert the sign of numbers
df[mask] *= -1   

Here's the output:

enter image description here

CodePudding user response:

If slice is a Series with boolean values, then logical negation operator ~ will give the opposite of the condition. So,

df[~slice] 

will return rows that doesn't satisfy the condition slice

CodePudding user response:

Not sure if this is you want, you can drop the index and columns of toSkip dataframe

toSkip = ((slice(None), slice(None)), (["X"], slice(None)))

tmp = df.loc[toSkip]
out = df.drop(index=tmp.index, columns=tmp.columns)
print(out)

Empty DataFrame
Columns: [(Y, XX), (Y, YY)]
Index: []
  • Related