I want to groupby my dataframe and concatenate the values/strings from the other columns together.
Year Letter Number Note Text
0 2022 a 1 8 hi
1 2022 b 1 7 hello
2 2022 a 1 6 bye
3 2022 b 3 5 joe
To this:
Column
Year Letter
2022 a 1|8|hi; 1|6|bye
b 1|7|hello; 3|5|joe
I tried some things with groupby, apply() and agg() but I can't get it work:
df.groupby(['Year', 'Letter']).agg(lambda x: '|'.join(x))
Output:
Text
Year Letter
2022 a hi|bye
b hello|joe
CodePudding user response:
You can first join values per rows converted to strings by DataFrame.astype
and DataFrame.agg
and then aggregate join
in GroupBy.agg
:
df1 = (df.assign(Text= df[['Number','Note','Text']].astype(str).agg('|'.join, axis=1))
.groupby(['Year', 'Letter'])['Text']
.agg('; '.join)
.to_frame())
print (df1)
Text
Year Letter
2022 a 1|8|hi; 1|6|bye
b 1|7|hello; 3|5|joe
Or create custom lambda function in GroupBy.apply
:
f = lambda x: '; '.join('|'.join(y) for y in x.astype(str).to_numpy())
df1 = (df.groupby(['Year', 'Letter'])[['Number','Note','Text']].apply(f)
.to_frame(name='Text')
)
print (df1)
Text
Year Letter
2022 a 1|8|hi; 1|6|bye
b 1|7|hello; 3|5|joe
If need join all columns without grouping columns:
grouped = ['Year','Letter']
df1 = (df.assign(Text= df[df.columns.difference(grouped, sort=False)]
.astype(str).agg('|'.join, axis=1))
.groupby(['Year', 'Letter'])['Text']
.agg('; '.join)
.to_frame())
grouped = ['Year','Letter']
f = lambda x: '; '.join('|'.join(y) for y in x.astype(str).to_numpy())
df1 = (df.groupby(grouped)[df.columns.difference(grouped, sort=False)].apply(f)
.to_frame(name='Text')
)
CodePudding user response:
Using groupby.apply
:
cols = ['Year', 'Letter']
(df.groupby(cols)
.apply(lambda d: '; '.join(d.drop(columns=cols) # or slice the columns here
.astype(str)
.agg('|'.join, axis=1)))
.to_frame(name='Column')
)
Output:
Column
Year Letter
2022 a 1|8|hi; 1|6|bye
b 1|7|hello; 3|5|joe