Home > Software design >  Pandas, filter dataframe based on unique values in one column and grouby in another
Pandas, filter dataframe based on unique values in one column and grouby in another

Time:12-06

I have a dataframe like this:

ID  Packet Type

    1   1    A
    2   1    B
    3   2    A
    4   2    C
    5   2    B
    6   3    A
    7   3    C
    8   4    C
    9   4    B
   10   5    B
   11   6    C
   12   6    B
   13   6    A
   14   7    A

I want to filter the dataframe so that I have only entries that are part of a packet with size n and which types are all different. There are only n types. For this example let's use n=3 and the types A,B,C.

In the end I want this:

ID  Packet Type

    3   2    A
    4   2    C
    5   2    B
   11   6    C
   12   6    B
   13   6    A

How do I do this with pandas?

CodePudding user response:

Another solution, using .groupby .filter:

df = df.groupby("Packet").filter(lambda x: len(x) == x["Type"].nunique() == 3)

print(df)

Prints:

    ID  Packet Type
2    3       2    A
3    4       2    C
4    5       2    B
10  11       6    C
11  12       6    B
12  13       6    A

CodePudding user response:

You can do transform with nunique

out = df[df.groupby('Packet')['Type'].transform('nunique')==3]
Out[46]: 
    ID  Packet Type
2    3       2    A
3    4       2    C
4    5       2    B
10  11       6    C
11  12       6    B
12  13       6    A

CodePudding user response:

I'd loop over the groupby object, filter and concatenate:

>>> pd.concat(frame for _,frame in df.groupby("Packet") if len(frame) == 3 and frame.Type.is_unique)
    ID  Packet Type
2    3       2    A
3    4       2    C
4    5       2    B
10  11       6    C
11  12       6    B
12  13       6    A
  • Related