I have
df = pd.DataFrame({"A": [1,2,3,4,5,6,7,8], "B": [1,1,2,2,3,3,4,4], "C": [1,1,1,1,2,3,2,2] })
A B C
0 1 1 1
1 2 1 1
2 3 2 1
3 4 2 1
4 5 3 2
5 6 3 3
6 7 4 2
7 8 4 2
I would like to see, for each value b of column B, the set of values c of column C that there are in rows where B=b.
So I'd like something like a series that tells me something like {1:[1], 2:[1], 3:[2,3], 4:[2]}
meaning that, for example, when B=3, the values of C are 2 and 3.
How do I do this? Thanks
CodePudding user response:
You can groupby and aggregate as set:
df.groupby('B')['C'].agg(set).to_dict()
# or, as lists
# df.groupby('B')['C'].agg(lambda x: list(set(x))).to_dict()
Output:
{1: {1}, 2: {1}, 3: {2, 3}, 4: {2}}
For the values in their original order:
df.groupby('B')['C'].agg(lambda x: list(dict.fromkeys(x))).to_dict()
Output:
{1: [1], 2: [1], 3: [2, 3], 4: [2]}