I am trying to solve this problem
I have two data tables for example
names age salary vehicle
jeff 20 100 missing
shinji 24 120 missing
rodger 18 150 missing
eric 25 160 missing
romeo 30 170 missing
and this other data table
names age salary vehicle industry
jeff 20 100 car video games
jeff 20 100 car cell phone
jeff 20 100 motorcycle soft drink
jeff 20 100 boat pharmaceuticals
shinji 24 120 car robots
shinji 24 120 car animation
rodger 18 150 car cars
rodger 18 150 motorcycle glasses
eric 25 160 boat video games
eric 25 160 car arms
romeo 30 70 boat vaccines
so for my first row I want vehicle instead of missing I want "CMB" for car,boat and motorcycle because jeff has all 3. For Shinji I would only want C because he has a car. For Rodger I want CM because he only has a boat.For eric I want CB because he CB because he has a car and boat. For romeo B because he only has a boat.
So for I want to go down the vehicle column of my second table and find all the vehicle the person.
But I am not sure the logic on how to to this. I know I can match them by age name and salary.
CodePudding user response:
Try this:
tmp = (
# Find the unique vehicloes for each person
df2[['names', 'vehicle']].drop_duplicates()
# Get the first letter of each vehicle in capital form
.assign(acronym=lambda x: x['vehicle'].str[0].str.upper())
# For each person, join the acronyms of all vehicles
.groupby('names')['acronym'].apply(''.join)
)
result = df1.merge(tmp, left_on='names', right_index=True)