Home > Software design >  Obtain corresponding column based on another column that matches another dataframe
Obtain corresponding column based on another column that matches another dataframe

Time:01-18

I want to find matching values from two data frames and return a third value. For example, if cpg_symbol["Gene_Symbol"] corresponds with diff_meth_kirp_symbol.index, I want to assign cpg_symbol.loc["Composite_Element_REF"] as index.

My code returned an empty dataframe.

diff_meth_kirp.index = diff_meth_kirp.merge(cpg_symbol, left_on=diff_meth_kirp.index, right_on="Gene_Symbol")[["Composite_Element_REF"]]

Example:

diff_meth_kirp

Hello My name is
First 0 1 2 3
Second 4 5 6 7
Third 8 9 10 11
Fourth 12 13 14 15
Fifth 16 17 18 19
Sixth 20 21 22 23

cpg_symbol

Composite_Element_REF Gene_Symbol
cg1 First
cg2 Third
cg3 Fifth
cg4 Seventh
cg5 Ninth
cg6 First

Expected output:

Hello My name is
cg1 0 1 2 3
cg2 8 9 10 11
cg3 16 17 18 19
cg6 0 1 2 3

CodePudding user response:

Your code works well for me but you can try this version:

out = (diff_meth_kirp.merge(cpg_symbol.set_index('Gene_Symbol'), 
                            left_index=True, right_index=True)
                     .set_index('Composite_Element_REF')
                     .rename_axis(None).sort_index())
print(out)

# Output
     Hello  My  name  is
cg1      0   1     2   3
cg2      8   9    10  11
cg3     16  17    18  19
cg6      0   1     2   3

Input dataframes:

data1 = {'Hello': {'First': 0, 'Second': 4, 'Third': 8, 'Fourth': 12, 'Fifth': 16, 'Sixth': 20},
         'My': {'First': 1, 'Second': 5, 'Third': 9, 'Fourth': 13, 'Fifth': 17, 'Sixth': 21}, 
         'name': {'First': 2, 'Second': 6, 'Third': 10, 'Fourth': 14, 'Fifth': 18, 'Sixth': 22},
         'is': {'First': 3, 'Second': 7, 'Third': 11, 'Fourth': 15, 'Fifth': 19, 'Sixth': 23}}
diff_meth_kirp = pd.DataFrame(data1)

data2 = {'Composite_Element_REF': {0: 'cg1', 1: 'cg2', 2: 'cg3', 3: 'cg4', 4: 'cg5', 5: 'cg6'},
         'Gene_Symbol': {0: 'First', 1: 'Third', 2: 'Fifth', 3: 'Seventh', 4: 'Ninth', 5: 'First'}}
cpg_symbol = pd.DataFrame(data2)
  • Related