I tried using itertools.groupby
with a pandas Series. But I got:
TypeError: boolean value of NA is ambiguous
Indeed some of my values are NA
.
This is a minimal reproducible example:
import pandas as pd
import itertools
g = itertools.groupby([pd.NA,0])
next(g)
next(g)
Comparing a NA
always results in NA
, so g.__next__
does while NA
and fails.
Is there a way to solve this, so itertools.groupby
works with NA
values? Or should I just accept it and use a different route to my (whatever) goal?
CodePudding user response:
How about using a key function in itertools.groupby
to convert pd.NA
to None
? Since ==
doesn't produce the desired output with pd.NA
, we can use the is
operator to perform identity comparison instead.
import pandas as pd
import itertools
arr = [pd.NA, pd.NA, 0, 1, 1]
keyfunc = lambda x: None if (x is pd.NA) else x
for key, group in itertools.groupby(arr, key=keyfunc):
print(key, list(group))
Output:
None [<NA>, <NA>]
0 [0]
1 [1, 1]