- I am using
groupby
to merge rows with the sameTransactionId
. - Code
ldf_object_page_data.groupby('TransactionId')[columns].agg(
' '.join).reset_index()
- Error
cannot reindex from a duplicate axis
- Sample DF
Transaction_Date Particulars Others Others Cheque Number Debit Credit Balance IsTransactionStart TransactionId
Date Remarks Tran Id UTR Number Instr. ID Withdrawals Deposits Balance False 11
01/04/2020 AA1746128 S71737774 - 57000 -4,84,31,253.20 False 11
03/04/2020 TO MADHAV LAAD AA213003 - 33215031 7000 -4,84,38,253.20 False 11
03/04/2020 TO PANDRINATH GANGRADE AA214967 - 33215032 13000 -4,84,51,253.20 False 11
03/04/2020 TO NITIN DHANGAR AA216517 - 33215034 30000 -4,84,81,253.20 False 11
03/04/2020 RTGSO- ELECTRICITY EXP MPPKVVCL UBINH20094172099 S80318780 - 33215033 5,68,499.00 -4,90,49,752.20 True 12
03/04/2020 RTGSO-BHARAT COTTON GINNERS UBINH20094172392 S80321244 - 33215035 3,44,708.00 -4,93,94,460.20 True 13
06/04/2020 OIC153500 DO KHANDWA S89963710 - 33211781 63407 -4,94,57,867.20 False 13
07/04/2020 RTGS:DHARA AGRO INDUSTRIES ICIC409700372928 S93671963 - 8,93,238.00 -4,85,64,629.20 False 13
08/04/2020 TRF TO JITENDRA SINGH UBEJA AA205798 - 33215036 7,00,000.00 -4,92,64,629.20 True 14
- DF in CSV
CodePudding user response:
Problem is duplicated columns names, first is necessary deduplicate them and then join with converting to strings:
df.columns = pd.io.parsers.ParserBase({'names':df.columns})._maybe_dedup_names(df.columns)
df = (df.set_index('TransactionId')
.astype(str)
.groupby('TransactionId')
.agg(' '.join)
.reset_index())
If need remove duplicates:
df = (df.set_index('TransactionId')
.astype(str)
.groupby('TransactionId')
.agg(lambda x: ' '.join(dict.fromkeys(x)))
.reset_index())