Home > Mobile >  Grabbing data using an id in one dataframe in another separate dataframe that do not posses the same
Grabbing data using an id in one dataframe in another separate dataframe that do not posses the same

Time:09-22

I have two Data Frames

jedis = {jedi_id': ["2", "4", "6", "1"],
'name':["Kylo", "Bastila", "Revan", "Steve from Minecraft"],
'Looted Items':}

inventory = {jedi_number': ["9", "4" , "6", "1", "1", "0", "2", "6", "1" , "55", "4", 
"4", "0", "9"], 'Loot':["Holocron", "Bantha Fodder", "Blaster", "Bantha Fodder", "Credits", "Bantha Fodder", "Blaster", "Bantha Fodder", "Holocron", "Blaster", "Holocron", "bread loaf", "Credits", "Holocron"]}

jedis_df = pd.DataFrame(jedis)
inventory_df = pd.DataFrame(inventory)
  • So in this problem, I need take all the jedis from the jedis_df via jedi_df and confirm that they're they exist within the inventory_df. (ex.jedi_id 2 exists in inventory)

  • Once The ID is identified and is in the jedis_df, then the associated looted item will attach add itself to the jedis_df with the associated ID

    • Example: Revan will have 'blaster, holocron' within the 'Looted Items' column and same cell next to the corresponding jedi ID
    x = jedis_df.merge(
    inventory_df.groupby("jedi_number").sum(),
    left_on="jedi_id",
    right_index=True,
    how="left",
    )
    
  • I tried merging the two tables but couldnt figure out a way to figure out the solution.

  • I only really know how to add one specific corresponding matching id to one cell so this is new to me.

If anyone could help with this problem that would be much appreciated

CodePudding user response:

groupby.sum automatically excludes non-numeric columns, so it doesn't concatenate strings within groups the way you might expect.

The solution is to run ', ',join on the Loot values of each group.

Option 1: groupby.agg

inventory_df.groupby("jedi_number")['Loot'].agg(', '.join)

Option 2: groupby.apply

inventory_df.groupby("jedi_number")['Loot'].apply(lambda x: ', '.join(x))

Both options produce the same output:

jedi_number
0                  Bantha Fodder, Credits
1        Bantha Fodder, Credits, Holocron
2                                 Blaster
4     Bantha Fodder, Holocron, bread loaf
55                                Blaster
6                  Blaster, Bantha Fodder
9                      Holocron, Holocron
Name: Loot, dtype: object

Replacing inventory_df.groupby("jedi_number").sum() with either of these options in your expression for x should produce the desired result.

  • Related