Сoncatenation of tables with matching of contents in columns-CodePudding

Column Id contains ID of sample
Column "before" contains what was before modification
Column "after" contains what happens after the modification

I want to merge tables by ID, but to have the same values in the "before" and "after" columns opposite each other. Sometimes table 1 contains nothing in the "before" column. Sometimes table 2 contains nothing in the "after" column. Sometimes "before" and "after" have a partial overlap or do not overlap sometimes a full overlap.

table_one

Id	before
id1	a
id1	b
id1	c
id3	d
id4	a
id4	b

table_two

Id	after
id1	b
id1	d
id2	c
id4	a
id4	b

concatinated_table

Id	before	after
id1	a	none
id1	b	b
id1	c	none
id1	none	d
id2	none	c
id3	d	none
id4	a	a
id4	b	b

Code to reproduce problem

before = pd.read_html('https://stackoverflow.com/questions/73403484/Сoncatenation-of-tables-with-matching-of-contents-in-columns')[0]
after = pd.read_html('https://stackoverflow.com/questions/73403484/Сoncatenation-of-tables-with-matching-of-contents-in-columns')[1]

output = pd.read_html('https://stackoverflow.com/questions/73403484/Сoncatenation-of-tables-with-matching-of-contents-in-columns')[2]

CodePudding user response：

Are you looking for output=pd.concat([before, after], axis=1)?

CodePudding user response：

To arrive at your output, you could use pd.merge as follows:

import pandas as pd

before_data = {'Id': ['id1', 'id1', 'id1', 'id3', 'id4', 'id4'],
               'before': ['a', 'b', 'c', 'd', 'a', 'b']}
before = pd.DataFrame(before_data)

after_data = {'Id': ['id1', 'id1', 'id2', 'id4', 'id4'], 
              'after': ['b', 'd', 'c', 'a', 'b']}
after = pd.DataFrame(after_data)

res = before.merge(after, left_on=['Id','before'], right_on=['Id','after'], 
                   how='outer').sort_values('Id').reset_index(drop=True)
print(res)

    Id before after
0  id1      a   NaN
1  id1      b     b
2  id1      c   NaN
3  id1    NaN     d
4  id2    NaN     c
5  id3      d   NaN
6  id4      a     a
7  id4      b     b

# same result as your `output`, at least after changing "none" to `np.nan`:
output.replace(to_replace='none', value=np.nan, inplace=True)
output.equals(res) # True