Home > OS >  How can get groups for which all rows contain a certain pattern with pandas
How can get groups for which all rows contain a certain pattern with pandas

Time:01-02

Example csv file:

    myId    tags
0   id_1    \N
1   id_1    \N
2   id_1    \N
3   id_1    \N
4   id_2    "[""tag1""]"
5   id_2    "[""tag1""]"
6   id_2    "[""tag0"",""tag1""]"
7   id_3    \N
8   id_3    \N
9   id_3    "[""tag1""]"
10  id_3    \N

From this, I want to return only id_2 with pandas. But why I want it? Because only that id has "tag1" in all of its members. So that's the thing I can't figure out how to query. I want to return ids that has tag1 in all of its members. I don't want id_3 for example because only 1 out of 4 members has the tag1 tag, I don't want id_1 either, because none of its members has tag1. On the other hand, all of the members of id_2 has a tag1 in their tags list.

Can someone help how to query this with pandas? This is just a small example, I want to know how to do something like this.

Thanks in advance.

CodePudding user response:

You can compute a mask to check if all the entries per group contain the pattern, then slice:

mask = df['tags'].str.contains('tag1').groupby(df['myId']).transform('all')

df[mask]

output:

   myId                   tags
4  id_2           "[""tag1""]"
5  id_2           "[""tag1""]"
6  id_2  "[""tag0"",""tag1""]"
  • Related