I would like to group the value in column m
for all rows that have the same first value in the list in column z
.
Sample Input:
m p z
x001 Foo [z42, z56, z31]
x321 Bar [z589, z78, z42]
x5432 Paz [z67]
x003 Foo [z589, z78, z32, z31]
x5478 Bah [z987, z345, z52]
x098 Fin [z42, z31]
x783 Lon [z42, z210, z458, z192]
Desired Output:
[x001, x098, x783]
[x321, x003]
Note that values in column m
are unique, while values in other columns may repeat and/or match exactly.
CodePudding user response:
you want to group by the first ([0]th) element of the "z" column, so we do that:
In [171]: df.groupby(df.z.str[0])["m"].agg(list)
Out[171]:
z
z42 [x001, x098, x783]
z589 [x321, x003]
z67 [x5432]
z987 [x5478]
Name: m, dtype: object
There exist only 1 record-having first values, so we can filter them out:
In [172]: res = _
In [173]: res[res.str.len() > 1]
Out[173]:
z
z42 [x001, x098, x783]
z589 [x321, x003]
Name: m, dtype: object
where res
is the result of the first operation.
Even though the .str
accessor by name seems to be for string operations, due to duck typing, we can use them (e.g., indexing or length) on lists as well, as seen above.