Home > Mobile >  Pandas select rows with smaller index
Pandas select rows with smaller index

Time:07-26

Here is I think a simple question, But I can't find an answer.

I'm trying to set a few rows from a dataframe using a mask of a mask. but I get an "Unalignable boolean Series provided as indexer" error.

Here is a small example:

import pandas as pd

data = {'c1':[1,2,3,4], 
        'c2':[2,4,6,8]}

df = pd.DataFrame(data)


mask = df['c1'] >= 3
mask2 = df.loc[mask, 'c2'] <= 6

df[mask2]
df[mask2, 'c2'] = -1

Here mask is:

0    False
1    False
2     True
3     True
Name: c1, dtype: bool

And mask2 is:

2     True
3    False
Name: c2, dtype: bool

But now df[mask2] yields:

IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match

What I should expect it to return is row with index 2:

   c1  c2
2   3   6

I realize that for this example df[(df['c1'] >= 3) & (df['c2'] <= 6)] Would give me the expected result, but my program flow requires a mask of a mask, instead of the intersection of 2 masks.

CodePudding user response:

One solution is to do a logical and (&) of both predicates:

mask = (df['c1'] >= 3) & (df['c2'] <= 6)
res = df[mask]
print(res)

Output

   c1  c2
2   3   6

Use the mask to change the values as follows:

df.loc[mask, 'c2'] = -1
print(df)

Output (after changing df)

   c1  c2
0   1   2
1   2   4
2   3  -1
3   4   8

CodePudding user response:

Your approach failed because you need a full array for boolean indexing and mask2 is not aligned with df anymore.

If you want to avoid computing your mask2 on all items (let's say you subselected only a fraction of the input and the operation is costly), you can reindex the mask:

df.loc[mask2.reindex_like(mask).fillna(False), 'c2_modified'] = -1

output (as new column for clarity):

   c1  c2  c2_modified
0   1   2          NaN
1   2   4          NaN
2   3   6         -1.0
3   4   8          NaN
  • Related