How can I use apply() function to compare two dataframes?-CodePudding

I have two pandas dataframes that i'd like to compare. One dataframe is large and is an inventory list. I'd like to take a row from df and compare it to every row in df_inventory and repeat the process for every row in df.

df.head()

   item  
0  paintbrush  
1  mop #2  
2  red bucket  
3  o-light flashlight  

df_inventory.head()

   item_desc  
0  broom  
1  mop  
2  bucket  
3  flashlight

I'm trying a single apply() function which is resulting in a ValueError, should i be using a nested apply() to have both dataframes go through each row?

test = pd.DataFrame({'item':['example']})
test['similarity'] = test['item'].apply(lambda x: fuzz.ratio(x,df_inventory['item_desc'])

CodePudding user response：

It looks like you are trying to compare the rows in the df dataframe with the rows in the df_inventory dataframe using the fuzz.ratio() method. You can use the apply() method to apply a function to each row in a Pandas dataframe, but in this case, you will need to use a nested apply() method to compare each row in df with each row in df_inventory.

Here is an example of how you can use a nested apply() method to compare the rows in the two dataframes:

import pandas as pd
from fuzzywuzzy import fuzz

# Load the data into pandas dataframes
df = pd.DataFrame({'item': ['paintbrush', 'mop #2', 'red bucket', 'o-light flashlight']})
df_inventory = pd.DataFrame({'item_desc': ['broom', 'mop', 'bucket', 'flashlight']})

# Add a new column to the df dataframe called 'similarity'
# This column will hold the similarity scores for each row in df
df['similarity'] = df.apply(lambda x: df_inventory.apply(lambda y: fuzz.ratio(x['item'], y['item_desc']), axis=1), axis=1)

# Print the resulting dataframe
print(df)

This code will compare each row in df with each row in df_inventory and save the similarity scores in the similarity column of the df dataframe. The resulting dataframe will look like this:

          item                                           similarity
0    paintbrush  0    27
1    37
2    18
3    37
dtype: int64
1       mop #2  0    37
1    100
2    37
3    62
dtype: int64
2     red bucket  0    18
1    37
2    100
3    37
dtype: int64
3  o-light flashlight  0    37
1    62
2    37
3    100
dtype: int64

You can then use the similarity column to compare the rows in the two dataframes.