Home > OS >  Replace two column values in another df for corresponding column values in pandas
Replace two column values in another df for corresponding column values in pandas

Time:12-11

I have two dataframe such as :

Tab1

COL1   COL2     COL3 
Sp1    False    A
Sp2    False    B 
Sp3    True     X 
Sp4    True     X 
Sp5    False    A 
Sp6    False    C 

Tab2

COL1 COL2_bis COL3_bis 
Sp1  True     X
Sp4  True     F
Sp6  True     X

And I would like to correct the Tab1 COL2 and COL3 columns compared to Tab2 COL2_bis and COL3_bis columns for corresponding COL1 values between the two tabs.

I should then get:

Tab1

COL1   COL2     COL3 
Sp1    True     X
Sp2    False    B 
Sp3    True     X 
Sp4    True     F 
Sp5    False    A 
Sp6    True     X 

Does somene have an idea please ?

Here are the table in dict format if it cna helps:

tab1:

{'COL1': {0: 'Sp1', 1: 'Sp2', 2: 'Sp3', 3: 'Sp4', 4: 'Sp5', 5: 'Sp6'}, 'COL2': {0: False, 1: False, 2: True, 3: True, 4: False, 5: False}, 'COL3 ': {0: 'A', 1: 'B ', 2: 'X ', 3: 'X ', 4: 'A ', 5: 'C '}}

tab2:

{'COL1': {0: 'Sp1', 1: 'Sp4', 2: 'Sp6'}, 'COL2_bis': {0: True, 1: True, 2: True}, 'COL3_bis ': {0: 'X', 1: 'F', 2: 'X'}}

CodePudding user response:

Use DataFrame.update by COL1 with set to indices and remove _bis substrings from columns names:

df1 = df1.set_index('COL1')
df2 = df2.set_index('COL1').rename(columns=lambda x: x.replace('_bis', ''))

df1.update(df2)
print (df1)
       COL2 COL3
COL1            
Sp1    True    X
Sp2   False    B
Sp3    True    X
Sp4    True    F
Sp5   False    A
Sp6    True    X

df = df1.reset_index()

CodePudding user response:

Rename columns of df2, concatenate the two dataframes and drop duplicates based on COL1:

df2.columns = df2.columns.str.replace('_bis', '')
out = pd.concat([df1, df2]).drop_duplicates('COL1', keep='last') \
        .sort_values('COL1').reset_index(drop=True)
print(out)

# Output:
  COL1   COL2 COL3
0  Sp1   True    X
1  Sp2  False    B
2  Sp3   True    X
3  Sp4   True    F
4  Sp5  False    A
5  Sp6   True    X

CodePudding user response:

It sounds like you would like to compute a "right outer join" on the two tables/dataframes. Is that correct?

For example:

corrected_tab_1 = pd.merge(Tab1, Tab2, right_on='COL2_bis')

Official documentation for pd.merge(): https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html

Examples of pd.merge() using different join strategies: https://www.analyticsvidhya.com/blog/2020/02/joins-in-pandas-master-the-different-types-of-joins-in-python/#:~:text=Right Join in Pandas,-Similarly, if we&text=Right join, also known as,that are common in both.

  • Related