I am trying to compare two columns from two different dataframes, and return all possible matches using python: (Kinda of an xlookup in excel but with multiple possible matches)
Please see the details below for sample dataframes and work I attempted.
An explanation of the datasets below: Mark does not own any cars, however, there are several listed under his name, which we know that none belong to him. I am attempting to look at dataframe 1 (Marks) and compare it against the larger dataset that has all other owners and their cars: dataframe 2 (claimed) and return possible owners for Mark's cars as shown below.
Dataframe 1 : Marks
Marks = pd.DataFrame({'Car Brand': ['Jeep','Jeep','BMW','Volvo'],'Owner Name': ['Mark',
'Mark', 'Mark', 'Mark']})
Car Brand Owner Name
0 Jeep Mark
1 Jeep Mark
2 BMW Mark
3 Volvo Mark
Dataframe 2: claimed
Dataframe 2: claimed
claimed = pd.DataFrame({'Car Brand': ['Dodge', 'Jeep', 'BMW', 'Merc', 'Volvo', 'Jeep',
'Volvo'], 'Owner Name': ['Chris', 'Frank','Rob','Kelly','John','Chris','Kelly']})
Car Brand Owner Name
0 Dodge Chris
1 Jeep Frank
2 BMW Rob
3 Merc Kelly
4 Volvo John
5 Jeep Chris
6 Volvo Kelly
The data does have duplicate car brand names HOWEVER, the Owner Names are unique - meaning that Kelly even though she is mentioned twice IS THE SAME PERSON. same for Chris..etc
I want my Mark's dataframe to have a new column that looks like this:
Car Brand Owner Name Possible Owners
0 Jeep Mark [Frank, Chris]
1 Jeep Mark [Frank, Chris]
2 BMW Mark Rob
3 Volvo Mark [John, Kelly]
I have tried the below codes:
possible_owners = list()
for cars in Marks['Car Brand']:
for car_brands in claimed['Car Brand']:
if Marks.loc[Marks['Car Brand'].isin(claimed['Car Brand'])]:
sub = list()
sub.append()
possible_owners.append(sub)
else:
not_found = 'No possible Owners Identified'
possible_owners.append(not_found)
#Then I will add possible_owners as a new column to Marks
error code:ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(),
a.item(), a.any() or a.all().
I have also tried to do a merge, excel xlookup but (that has many limitations), and I am stuck trying to understand how to return possible matches even if there are multiple and line them up in one row.
Question: how can I compare the two frames, return possible values from the Owner Name column and put these values in a new column in Marks' table?
Excuse my code, I am fairly new to Python.
CodePudding user response:
You could pre-process the claimed dataframe then merge:
lookup = claimed.groupby('Car Brand').apply(lambda x: x['Owner Name'].to_list()).to_frame()
df_m = Marks.merge(lookup, on='Car Brand', how='left').rename(columns={0:'Possible Owners'})
print(df_m)
Result
Car Brand Owner Name Possible Owners
0 Jeep Mark [Frank, Chris]
1 Jeep Mark [Frank, Chris]
2 BMW Mark [Rob]
3 Volvo Mark [John, Kelly]
CodePudding user response:
You can always use a list comprehension with the df.Series.isin
to do the work.
result = [claimed[claimed['Car Brand'].isin([i])]['Owner Name'].to_numpy() for i in Marks['Car Brand']]
Marks['Possible Owners'] = result
Car Brand Owner Name Possible Owners
0 Jeep Mark [Frank, Chris]
1 Jeep Mark [Frank, Chris]
2 BMW Mark [Rob]
3 Volvo Mark [John, Kelly]