so I have 2 CSV files in file1 I have list of research groups names. in file2 I have list of the Research full name with location as wall. I want to join these 2 csv file if the have the words matches in them.
in file1.cvs
research_groups_names |
---|
Chinese Academy of Sciences (CAS) |
CAS |
U-M |
UQ |
in file2.cvs
research_groups_names | Location |
---|---|
Chinese Academy of Sciences (CAS) | China |
University of Michigan (U-M) | United States of America (USA) |
The University of Queensland (UQ) | Australia |
the Output.csv
f1_research_groups_names | f2_research_groups_names | Location |
---|---|---|
Chinese Academy of Sciences | Chinese Academy ofSciences(CAS) | China |
CAS | Chinese Academy of Sciences (CAS) | China |
U-M | University of Michigan (U-M) | United States of America(USA) |
UQ | The University of Queensland (UQ) | Australia |
import pandas as pd
df1 = pd.read_csv('file1.csv')
df2 = pd.read_csv('file1.csv')
df1 = df1.add_prefix('f1_')
df2 = df2.add_prefix('f2_')
def compare_nae(df):
if df1['f1_research_groups_names'] == df2['f2_research_groups_names']:
return 1
else:
return 0
result = pd.merge(df1, df2, left_on=['f1_research_groups_names'],right_on=['f2_research_groups_names'], how="left")
result.to_csv('output.csv')
CodePudding user response:
You can try:
def fn(row):
for _, n in df2.iterrows():
if (
n["research_groups_names"] == row["research_groups_names"]
or row["research_groups_names"] in n["research_groups_names"]
):
return n
df1[["f2_research_groups_names", "f2_location"]] = df1.apply(fn, axis=1)
df1 = df1.rename(columns={"research_groups_names": "f1_research_groups_names"})
print(df1)
Prints:
f1_research_groups_names f2_research_groups_names f2_location
0 Chinese Academy of Sciences (CAS) Chinese Academy of Sciences (CAS) China
1 CAS Chinese Academy of Sciences (CAS) China
2 U-M University of Michigan (U-M) United States of America (USA)
3 UQ The University of Queensland (UQ) Australia
Note: If in df1
is name not found in df2
there will be None
, None
in columns "f2_research_groups_names"
and "f2_location"