Let's say I have the following Dataframe:
df = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo",
"bar", "bar", "bar", "bar","bar"],
"B": ["one", "one", "one", "two", "two",
"one", "one", "two", "two","two"],
"C": ["small", "large", "large", "small",
"small", "large", "small", "small",
"large", "large"],
"D": [1, 2, 3, 4, 5, 6, 7, 8, 9,99999]})
I'd like to join (concatenate? or merge?) values in "D" column if there is an intersection of values in "A", "B" and "C" . By intersection, what I mean is that I want to have this DataFrame:
A B C D
0 foo one small 1
1 foo one large 2,3
2 foo two small 4,5
3 bar one large 6
4 bar one small 7
5 bar two small 8
6 bar two large 9,99999
There are aggregation functions like min, max, sum etc, but I couldn't come up with a solution at all.
CodePudding user response:
Convert column D
to strings, so possible aggregate by join
in GroupBy.agg
:
df1 = (df.assign(D = df.D.astype(str))
.groupby(['A','B','C'], sort=False)['D']
.agg(','.join)
.reset_index())
print (df1)
A B C D
0 foo one small 1
1 foo one large 2,3
2 foo two small 4,5
3 bar one large 6
4 bar one small 7
5 bar two small 8
6 bar two large 9,99999
Or use lambda function:
df1 = (df.groupby(['A','B','C'], sort=False)['D']
.agg(lambda x: ','.join(x.astype(str)))
.reset_index())
print (df1)
A B C D
0 foo one small 1
1 foo one large 2,3
2 foo two small 4,5
3 bar one large 6
4 bar one small 7
5 bar two small 8
6 bar two large 9,99999
If possible duplicated values in D
per groups and need unique values add DataFrame.drop_duplicates
or Series.unique
:
df2 = (df.assign(D = df.D.astype(str))
.drop_duplicates(['A','B','C','D'])
.groupby(['A','B','C'], sort=False)['D']
.agg(','.join)
.reset_index())
df2 = (df.groupby(['A','B','C'], sort=False)['D']
.agg(lambda x: ','.join(x.astype(str).unique()))
.reset_index())