Example csv file:
myId tags
0 id_1 \N
1 id_1 \N
2 id_1 \N
3 id_1 \N
4 id_2 "[""tag1""]"
5 id_2 "[""tag1""]"
6 id_2 "[""tag0"",""tag1""]"
7 id_3 \N
8 id_3 \N
9 id_3 "[""tag1""]"
10 id_3 \N
From this, I want to return only id_2 with pandas. But why I want it? Because only that id has "tag1" in all of its members. So that's the thing I can't figure out how to query. I want to return ids that has tag1 in all of its members. I don't want id_3 for example because only 1 out of 4 members has the tag1 tag, I don't want id_1 either, because none of its members has tag1. On the other hand, all of the members of id_2 has a tag1 in their tags list.
Can someone help how to query this with pandas? This is just a small example, I want to know how to do something like this.
Thanks in advance.
CodePudding user response:
You can compute a mask to check if all
the entries per group contain the pattern, then slice:
mask = df['tags'].str.contains('tag1').groupby(df['myId']).transform('all')
df[mask]
output:
myId tags
4 id_2 "[""tag1""]"
5 id_2 "[""tag1""]"
6 id_2 "[""tag0"",""tag1""]"