Home > Net >  Concatenate pandas column values based on common index
Concatenate pandas column values based on common index

Time:11-25

Input dataframe

A      B
n1     "joe,jack"
n2     "kelly,john"
n3     "adam,sam"
n1     "jack,frank"
n3     "rita"
n4     "steve, buck"
n2     "john, kelly, peter"

Based on index column A, I want to concat text, seperated with comma(,). So the expected output would look like(any instance of repetition is taken only once)

A       B
n1      joe,jack,frank
n2      kelly,john,peter
n3      adam,sam,rita
n4      steve, buck

CodePudding user response:

Use GroupBy.agg with custom function with split, set comprehension and join if order is not important:

f = lambda x: ','.join(set([z for y in x for z in y.replace(', ',',').split(',')]))
df = df.groupby('A')['B'].agg(f).reset_index()
print (df)
    A                 B
0  n1    jack,joe,frank
1  n2  john,kelly,peter
2  n3     adam,rita,sam
3  n4        steve,buck

If order is important for remove duplicated use dict.fromkeys trick:

f = lambda x:','.join(dict.fromkeys([z for y in x for z in y.replace(', ',',').split(',')]))
df = df.groupby('A')['B'].agg(f).reset_index()
print (df)
    A                 B
0  n1    joe,jack,frank
1  n2  kelly,john,peter
2  n3     adam,sam,rita
3  n4        steve,buck
  • Related