Find and group values that match in a specific column, based on a predicate-CodePudding

I would like to group the value in column m for all rows that have the same first value in the list in column z.

Sample Input:

m     p   z
x001  Foo [z42, z56, z31]
x321  Bar [z589, z78, z42]
x5432 Paz [z67]
x003  Foo [z589, z78, z32, z31]
x5478 Bah [z987, z345, z52]
x098  Fin [z42, z31]
x783  Lon [z42, z210, z458, z192]

Desired Output:

[x001, x098, x783]
[x321, x003]

Note that values in column m are unique, while values in other columns may repeat and/or match exactly.

CodePudding user response：

you want to group by the first ([0]th) element of the "z" column, so we do that:

In [171]: df.groupby(df.z.str[0])["m"].agg(list)
Out[171]:
z
z42     [x001, x098, x783]
z589          [x321, x003]
z67                [x5432]
z987               [x5478]
Name: m, dtype: object

There exist only 1 record-having first values, so we can filter them out:

In [172]: res = _

In [173]: res[res.str.len() > 1]
Out[173]:
z
z42     [x001, x098, x783]
z589          [x321, x003]
Name: m, dtype: object

where res is the result of the first operation.

Even though the .str accessor by name seems to be for string operations, due to duck typing, we can use them (e.g., indexing or length) on lists as well, as seen above.