I want to combine multiple rows into a single row, and the original dataframes is down below:
Item Date Invoice No Center Address
0 44 24/2/2022 AF6026321237160 Japan 106-0041 Tokyo-to,
1 Minato-ku, Azabudai,
2 1 no 9 no 12.
3 45 24/2/2022 AF6026321237179 Korea Bldg. 102 Unit 304
4 Sajik-ro-3-gil23
5 Jongno-gu, Seoul 30174
6 46 24/2/2022 AF6026321237188 HK Flat 25, 12/F, Acacia Building
7 150 Kennedy Road
8 WAN CHAI
After combining the rows
Item Date Invoice No Center Address
0 44 24/2/2022 AF6026321237160 Japan 106-0041 Tokyo-to,Minato-ku, Azabudai,1 no 9 no 12.
1 45 24/2/2022 AF6026321237179 Korea Bldg. 102 Unit 304Sajik-ro-3-gil23Jongno-gu,Seoul 30174
2 46 24/2/2022 AF6026321237188 HK Flat 25, 12/F, Acacia Building150 Kennedy Road,WAN CHAI
Is there any possible solutions? I want to combine and concatenate address from several rows into one row
I tried this code before but the result is not what I expect
df = df.groupby(['Item'])['Address'].transform(lambda x : ''.join(x))
CodePudding user response:
You can use the non-empty values in a safe column to define groups, then aggregate:
# group rows that follow a row with non-empty value in Item
group = df['Item'].fillna('').ne('').cumsum()
# create a dictionary of aggregation functions
# by default get first row of group
d = {c: 'first' for c in df}
# for Address, join the rows
d['Address'] = ' '.join
df2 = df.groupby(group).agg(d)
Output:
Item Date Invoice No Center Address
Item
1 44 24/2/2022 AF6026321237160 Japan 106-0041 Tokyo-to, Minato-ku, Azabudai, 1 no 9 no 12.
2 45 24/2/2022 AF6026321237179 Korea Bldg. 102 Unit 304 Sajik-ro-3-gil23 Jongno-gu, Seoul 30174
3 46 24/2/2022 AF6026321237188 HK Flat 25, 12/F, Acacia Building 150 Kennedy Road WAN CHAI