Home > Software design >  compare 2 csv files uing the merge and compare row by row
compare 2 csv files uing the merge and compare row by row

Time:07-13

so I have 2 CSV files in file1 I have list of research groups names. in file2 I have list of the Research full name with location as wall. I want to join these 2 csv file if the have the words matches in them.

in file1.cvs

research_groups_names
Chinese Academy of Sciences (CAS)
CAS
U-M
UQ

in file2.cvs

research_groups_names Location
Chinese Academy of Sciences (CAS) China
University of Michigan (U-M) United States of America (USA)
The University of Queensland (UQ) Australia

the Output.csv

f1_research_groups_names f2_research_groups_names Location
Chinese Academy of Sciences Chinese Academy ofSciences(CAS) China
CAS Chinese Academy of Sciences (CAS) China
U-M University of Michigan (U-M) United States of America(USA)
UQ The University of Queensland (UQ) Australia
     import pandas as pd
    df1 = pd.read_csv('file1.csv')
    
    df2 = pd.read_csv('file1.csv')

    df1 = df1.add_prefix('f1_')
    df2 = df2.add_prefix('f2_')

    
    
    def compare_nae(df):
        if  df1['f1_research_groups_names'] == df2['f2_research_groups_names']:
            return 1
        else:
            return 0
    
      
result = pd.merge(df1, df2, left_on=['f1_research_groups_names'],right_on=['f2_research_groups_names'], how="left")

    result.to_csv('output.csv')

CodePudding user response:

You can try:

def fn(row):
    for _, n in df2.iterrows():
        if (
            n["research_groups_names"] == row["research_groups_names"]
            or row["research_groups_names"] in n["research_groups_names"]
        ):
            return n


df1[["f2_research_groups_names", "f2_location"]] = df1.apply(fn, axis=1)
df1 = df1.rename(columns={"research_groups_names": "f1_research_groups_names"})
print(df1)

Prints:

            f1_research_groups_names           f2_research_groups_names                     f2_location
0  Chinese Academy of Sciences (CAS)  Chinese Academy of Sciences (CAS)                           China
1                                CAS  Chinese Academy of Sciences (CAS)                           China
2                                U-M       University of Michigan (U-M)  United States of America (USA)
3                                 UQ  The University of Queensland (UQ)                       Australia

Note: If in df1 is name not found in df2 there will be None, None in columns "f2_research_groups_names" and "f2_location"

  • Related