Column Id contains ID of sample
Column "before" contains what was before modification
Column "after" contains what happens after the modification
I want to merge tables by ID, but to have the same values in the "before" and "after" columns opposite each other. Sometimes table 1 contains nothing in the "before" column. Sometimes table 2 contains nothing in the "after" column. Sometimes "before" and "after" have a partial overlap or do not overlap sometimes a full overlap.
table_one
Id | before |
---|---|
id1 | a |
id1 | b |
id1 | c |
id3 | d |
id4 | a |
id4 | b |
table_two
Id | after |
---|---|
id1 | b |
id1 | d |
id2 | c |
id4 | a |
id4 | b |
concatinated_table
Id | before | after |
---|---|---|
id1 | a | none |
id1 | b | b |
id1 | c | none |
id1 | none | d |
id2 | none | c |
id3 | d | none |
id4 | a | a |
id4 | b | b |
Code to reproduce problem
before = pd.read_html('https://stackoverflow.com/questions/73403484/Сoncatenation-of-tables-with-matching-of-contents-in-columns')[0]
after = pd.read_html('https://stackoverflow.com/questions/73403484/Сoncatenation-of-tables-with-matching-of-contents-in-columns')[1]
output = pd.read_html('https://stackoverflow.com/questions/73403484/Сoncatenation-of-tables-with-matching-of-contents-in-columns')[2]
CodePudding user response:
Are you looking for output=pd.concat([before, after], axis=1)
?
CodePudding user response:
To arrive at your output
, you could use pd.merge
as follows:
import pandas as pd
before_data = {'Id': ['id1', 'id1', 'id1', 'id3', 'id4', 'id4'],
'before': ['a', 'b', 'c', 'd', 'a', 'b']}
before = pd.DataFrame(before_data)
after_data = {'Id': ['id1', 'id1', 'id2', 'id4', 'id4'],
'after': ['b', 'd', 'c', 'a', 'b']}
after = pd.DataFrame(after_data)
res = before.merge(after, left_on=['Id','before'], right_on=['Id','after'],
how='outer').sort_values('Id').reset_index(drop=True)
print(res)
Id before after
0 id1 a NaN
1 id1 b b
2 id1 c NaN
3 id1 NaN d
4 id2 NaN c
5 id3 d NaN
6 id4 a a
7 id4 b b
# same result as your `output`, at least after changing "none" to `np.nan`:
output.replace(to_replace='none', value=np.nan, inplace=True)
output.equals(res) # True