Im in a project using pyhton where i have two dataframes
Product Name | Cost |
---|---|
Car with batteries | 2 |
Headphones Sony | 3 |
Product Name | Cost |
---|---|
Car batteries | 2 |
Headphones Sony | 3 |
What i have to do is compare this dataframes and put a column that said 'Matches' But like you see it have to compare Car with batteries and Car batteries and put matches because they are the same product. It is possible?.
CodePudding user response:
Assuming the first dataframe is df1
and the second one is df2
, you can:
- Create a new column in the dataframe of your choice (either
df1
,df2
or even a new one) - Call this new column
"Matched"
- Populate the new column values with the results from the condition: populate with
True
if the condition is true, otherwise populate withFalse
:
df1["Matches"] = df1["Cost"] == df1["Cost"]
This will create a new column of True/False values based on whether the battery costs of the same rows in the 2 dataframes match or not (assumes that df1
and df2
have the same number of rows).
CodePudding user response:
You could use fuzzywuzzy
to find the similarity of the values. Using for example fuzz.ratio(str1, str2)
will return you an indicator how similar the strings are. There are other methods as well. You would have to find out yourself what suits best for your use case.
The example below uses fuzz.ratio(str1, str2)
and considers a ratio of 80 to be equals:
# pip install fuzzywuzzy python-levenshtein
# or: conda install -c conda-forge fuzzywuzzy python-levenshtein
import io
import pandas as pd
from fuzzywuzzy import fuzz
df1 = pd.read_csv(io.StringIO("""
Product Name, Cost
Car with batteries, 2
Headphones Sony, 3
"""))
df2 = pd.read_csv(io.StringIO("""
Product Name, Cost
Car batteries, 2
Headphones Sony, 3
"""))
COLUMN_NAME = "Product Name"
ACCEPTED_RATIO = 80
def match(right, left):
return fuzz.ratio(right, left) > ACCEPTED_RATIO
rsuffix = "_r"
compared = df1.join(df2, rsuffix=rsuffix)
compared["Matches"] = compared.apply(
lambda x: match(x[COLUMN_NAME], x[f"{COLUMN_NAME}{rsuffix}"]),
axis=1,
)
compared = compared.drop(
[c for c in compared.columns if c.endswith(rsuffix)],
axis=1
)
And the output of print(compared)
would be:
Product Name Cost Matches
0 Car with batteries 2 True
1 Headphones Sony 3 True