Home > Software design >  Can itertools.groupby use pd.NA?
Can itertools.groupby use pd.NA?

Time:01-26

I tried using itertools.groupby with a pandas Series. But I got:

TypeError: boolean value of NA is ambiguous

Indeed some of my values are NA.

This is a minimal reproducible example:

import pandas as pd
import itertools

g = itertools.groupby([pd.NA,0])
next(g)
next(g)

Comparing a NA always results in NA, so g.__next__ does while NA and fails.

Is there a way to solve this, so itertools.groupby works with NA values? Or should I just accept it and use a different route to my (whatever) goal?

CodePudding user response:

How about using a key function in itertools.groupby to convert pd.NA to None? Since == doesn't produce the desired output with pd.NA, we can use the is operator to perform identity comparison instead.

import pandas as pd
import itertools

arr = [pd.NA, pd.NA, 0, 1, 1]
keyfunc = lambda x: None if (x is pd.NA) else x
for key, group in itertools.groupby(arr, key=keyfunc):
    print(key, list(group))

Output:

None [<NA>, <NA>]
0 [0]
1 [1, 1]
  • Related