I have two excel spreadsheets loaded into 2 different dataframes. One column(System) in spreadsheet 1 has a system code which I need to match with another column(Description) in spreadsheet 2. The second column has a description which may or may not have system in it along with other string. If match is found I want to append spreadsheet 2 with a new column that has the system code.
df1 = pd.DataFrame(
{
"System": ["HFW", "SYS", "ABC"],
"Description": ["HFW Description", "Sys Description", "ABC Description"],
}
)
df2 = pd.DataFrame(
{
"Description": [
"Amount spent for HFW",
"Spending amt on XYZ",
"INV20563BG",
"ABC Cost",
"ABC Cost 2",
],
"Amount": ["150", "175", "160", "180", "100"],
}
)
So basically need to metch 'System' column from DF1 to 'Description' in DF2. DF1 and DF2 could have more columns and different # of rows.
Tried these options:
df1["MatchingSystem"] = df1.System.apply(
lambda x: 1 if df2["Description"].str.contains(x) else 0
)
Tried a few other things as well. Any help is appreciated
CodePudding user response:
You can compare 2 list of strings and write the first match:
sys_values = df1.System.values
df2["MatchingSystem"] = df2.Description.apply(
lambda x: next((sys for sys in sys_values if sys in x.split()), None)
)
The resulting dataframe df2
is:
Description Amount MatchingSystem
0 Amount spent for HFW 150 HFW
1 Spending amt on XYZ 175 None
2 INV20563BG 160 None
3 ABC Cost 180 ABC
4 ABC Cost 2 100 ABC
I create the new column in the df2 dataframe because it seems like one df1.System
can be in multiple df2.Description
.
It's a bit dirty but I think it does the work. Let me know if you have any problem or question.