Home > Blockchain >  Data Matching using pandas cumulative columns
Data Matching using pandas cumulative columns

Time:11-18

I am trying to solve this problem

I have two data tables for example

             names     age  salary  vehicle
             jeff      20    100     missing
             shinji    24     120      missing
             rodger    18      150     missing
             eric       25     160     missing
             romeo     30       170     missing

and this other data table

            names     age     salary  vehicle         industry
           jeff       20         100   car             video games 
            jeff      20         100    car            cell phone 
            jeff      20         100    motorcycle      soft drink
            jeff      20          100    boat           pharmaceuticals
            shinji    24         120    car            robots
            shinji     24         120    car            animation 
            rodger     18        150     car           cars
            rodger      18       150     motorcycle      glasses
            eric       25        160      boat           video games
            eric      25         160      car            arms
            romeo     30         70       boat           vaccines 

so for my first row I want vehicle instead of missing I want "CMB" for car,boat and motorcycle because jeff has all 3. For Shinji I would only want C because he has a car. For Rodger I want CM because he only has a boat.For eric I want CB because he CB because he has a car and boat. For romeo B because he only has a boat.

So for I want to go down the vehicle column of my second table and find all the vehicle the person.

But I am not sure the logic on how to to this. I know I can match them by age name and salary.

CodePudding user response:

Try this:

tmp = (
    # Find the unique vehicloes for each person
    df2[['names', 'vehicle']].drop_duplicates()
        # Get the first letter of each vehicle in capital form
        .assign(acronym=lambda x: x['vehicle'].str[0].str.upper())
        # For each person, join the acronyms of all vehicles
        .groupby('names')['acronym'].apply(''.join)
)

result = df1.merge(tmp, left_on='names', right_index=True)
  • Related