I am struggling to join two dataframes by index (I've made column FileName an index for both tables) which look like this:
Table 1
FileName | Transcriber | Transcription |
---|---|---|
612_000002.wav | 100% (80/80) | Are we starting off? |
612_000002.wav | 100% (50/50) | shall we starting on |
612_000002.wav | 100% (2/2) | fast mode |
612_000002.wav | 100% (258/259) | Go and start it up |
612_000002.wav | 100% (20/20) | Are we starting off? |
612_000003.wav | 100% (258/259) | here we go, hey well woah woah woah |
612_000003.wav | 100% (23/23) | evening gulf air |
612_000003.wav | 100% (32/32) | And as the 1st group reached the bottom of the... |
612_000003.wav | 100% (80/80) | Happy to go off here, woah woah woah |
612_000003.wav | 100% (10/10) | Go boom we'll just |
Table 2 is similar and looks like this:
FileName | Transcriber | Transcription |
---|---|---|
612_000002.wav | Quartznet | there was not inl |
612_000002.wav | Transducer_M | don't start again |
612_000002.wav | Transducer_L | do we start again |
612_000003.wav | Transducer_L | anything off yeah i'm willing we'll just |
612_000003.wav | Transducer_S | having gone here on will and wolf is |
So I've looked into concat, merge, and join. But they don't seem to yield the output I am looking for. What I would like to have is all values from both tables for filename1, all values for filename2 and etc. Basically, adding rows from table2 to table1. Is there any way around it? Thank you <3
CodePudding user response:
Use:
pd.concat([df1,df2]).drop_duplicates()
CodePudding user response:
If I understand it correctly, then what you need is to firstly concatenate 2 dataFrames and afterwards use the function groupby (https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.groupby.html).
This can as well be implemented as in this question: Pandas Groupby of 2 dataframes.
Hope this could help.