Home > Enterprise >  Сoncatenation of tables with matching of contents in columns
Сoncatenation of tables with matching of contents in columns

Time:08-19

Column Id contains ID of sample
Column "before" contains what was before modification
Column "after" contains what happens after the modification

I want to merge tables by ID, but to have the same values in the "before" and "after" columns opposite each other. Sometimes table 1 contains nothing in the "before" column. Sometimes table 2 contains nothing in the "after" column. Sometimes "before" and "after" have a partial overlap or do not overlap sometimes a full overlap.

table_one

Id before
id1 a
id1 b
id1 c
id3 d
id4 a
id4 b

table_two

Id after
id1 b
id1 d
id2 c
id4 a
id4 b

concatinated_table

Id before after
id1 a none
id1 b b
id1 c none
id1 none d
id2 none c
id3 d none
id4 a a
id4 b b

Code to reproduce problem

before = pd.read_html('https://stackoverflow.com/questions/73403484/Сoncatenation-of-tables-with-matching-of-contents-in-columns')[0]
after = pd.read_html('https://stackoverflow.com/questions/73403484/Сoncatenation-of-tables-with-matching-of-contents-in-columns')[1]

output = pd.read_html('https://stackoverflow.com/questions/73403484/Сoncatenation-of-tables-with-matching-of-contents-in-columns')[2]

screen

CodePudding user response:

Are you looking for output=pd.concat([before, after], axis=1)?

CodePudding user response:

To arrive at your output, you could use pd.merge as follows:

import pandas as pd

before_data = {'Id': ['id1', 'id1', 'id1', 'id3', 'id4', 'id4'],
               'before': ['a', 'b', 'c', 'd', 'a', 'b']}
before = pd.DataFrame(before_data)

after_data = {'Id': ['id1', 'id1', 'id2', 'id4', 'id4'], 
              'after': ['b', 'd', 'c', 'a', 'b']}
after = pd.DataFrame(after_data)

res = before.merge(after, left_on=['Id','before'], right_on=['Id','after'], 
                   how='outer').sort_values('Id').reset_index(drop=True)
print(res)

    Id before after
0  id1      a   NaN
1  id1      b     b
2  id1      c   NaN
3  id1    NaN     d
4  id2    NaN     c
5  id3      d   NaN
6  id4      a     a
7  id4      b     b

# same result as your `output`, at least after changing "none" to `np.nan`:
output.replace(to_replace='none', value=np.nan, inplace=True)
output.equals(res) # True
  • Related