Here is I think a simple question, But I can't find an answer.
I'm trying to set a few rows from a dataframe using a mask of a mask. but I get an "Unalignable boolean Series provided as indexer" error.
Here is a small example:
import pandas as pd
data = {'c1':[1,2,3,4],
'c2':[2,4,6,8]}
df = pd.DataFrame(data)
mask = df['c1'] >= 3
mask2 = df.loc[mask, 'c2'] <= 6
df[mask2]
df[mask2, 'c2'] = -1
Here mask
is:
0 False
1 False
2 True
3 True
Name: c1, dtype: bool
And mask2
is:
2 True
3 False
Name: c2, dtype: bool
But now df[mask2]
yields:
IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match
What I should expect it to return is row with index 2:
c1 c2
2 3 6
I realize that for this example df[(df['c1'] >= 3) & (df['c2'] <= 6)]
Would give me the expected result, but my program flow requires a mask of a mask, instead of the intersection of 2 masks.
CodePudding user response:
One solution is to do a logical and (&
) of both predicates:
mask = (df['c1'] >= 3) & (df['c2'] <= 6)
res = df[mask]
print(res)
Output
c1 c2
2 3 6
Use the mask
to change the values as follows:
df.loc[mask, 'c2'] = -1
print(df)
Output (after changing df)
c1 c2
0 1 2
1 2 4
2 3 -1
3 4 8
CodePudding user response:
Your approach failed because you need a full array for boolean indexing and mask2
is not aligned with df
anymore.
If you want to avoid computing your mask2 on all items (let's say you subselected only a fraction of the input and the operation is costly), you can reindex
the mask:
df.loc[mask2.reindex_like(mask).fillna(False), 'c2_modified'] = -1
output (as new column for clarity):
c1 c2 c2_modified
0 1 2 NaN
1 2 4 NaN
2 3 6 -1.0
3 4 8 NaN