Home > Blockchain >  Join 2 pandas dataframes by adding rows with the same index
Join 2 pandas dataframes by adding rows with the same index

Time:03-20

I am struggling to join two dataframes by index (I've made column FileName an index for both tables) which look like this:

Table 1

FileName Transcriber Transcription
612_000002.wav 100% (80/80) Are we starting off?
612_000002.wav 100% (50/50) shall we starting on
612_000002.wav 100% (2/2) fast mode
612_000002.wav 100% (258/259) Go and start it up
612_000002.wav 100% (20/20) Are we starting off?
612_000003.wav 100% (258/259) here we go, hey well woah woah woah
612_000003.wav 100% (23/23) evening gulf air
612_000003.wav 100% (32/32) And as the 1st group reached the bottom of the...
612_000003.wav 100% (80/80) Happy to go off here, woah woah woah
612_000003.wav 100% (10/10) Go boom we'll just

Table 2 is similar and looks like this:

FileName Transcriber Transcription
612_000002.wav Quartznet there was not inl
612_000002.wav Transducer_M don't start again
612_000002.wav Transducer_L do we start again
612_000003.wav Transducer_L anything off yeah i'm willing we'll just
612_000003.wav Transducer_S having gone here on will and wolf is

So I've looked into concat, merge, and join. But they don't seem to yield the output I am looking for. What I would like to have is all values from both tables for filename1, all values for filename2 and etc. Basically, adding rows from table2 to table1. Is there any way around it? Thank you <3

CodePudding user response:

Use:

pd.concat([df1,df2]).drop_duplicates()

CodePudding user response:

If I understand it correctly, then what you need is to firstly concatenate 2 dataFrames and afterwards use the function groupby (https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.groupby.html).

This can as well be implemented as in this question: Pandas Groupby of 2 dataframes.

Hope this could help.

  • Related