Home > Software engineering >  Join series with repeated index on dataframe where column values are equal to the index in the serie
Join series with repeated index on dataframe where column values are equal to the index in the serie

Time:07-22

Say I have the series an dataframe like:

import pandas as pd
s = pd.Series([10,20,11,12,30,34],
    index=["red","red","blue","blue","green","green"])
s.index.name="numbers"

df = pd.DataFrame({
    "color":["red","green","blue","blue","red","green"],
    "id":[1,2,3,4,5,6]})

I want to add the values in s to the column in df in the same order as they appear where the index of s is equal to df["color"] i.e

pd.some_function(df,s,left_on="color",right_index=True)

color   id    numbers
red      1      10
green    2      30
blue     3      11
blue     4      12
red      5      20
green    6      34

I have tried pd.merge, pd.join etc. but I simply cannot make it work (without looping over df, filtered by color, add the data from s and then concat it at the end)

CodePudding user response:

You can use groupby.cumcount to set up a unique key for the merge:

idx1 = s.groupby(level=0).cumcount()
# [0, 1, 0, 1, 0, 1]
idx2 = df.groupby('color').cumcount()
# [0, 0, 0, 1, 1, 1]

s.index.name="color"
out = (df
   .merge(s.reset_index(name='number'),
          left_on=['color', idx2], right_on=['color', idx1])
   .drop(columns='key_1')
)

variant:

s.index.name="color"
out = (df
   .assign(idx=df.groupby('color').cumcount())
   .merge(s.reset_index(name='number')
           .assign(idx=s.groupby(level=0).cumcount().values),
          left_on=['color', 'idx'], right_on=['color', 'idx'])
    .drop(columns='idx')
)

output:

   color  id  number
0    red   1      10
1  green   2      30
2   blue   3      11
3   blue   4      12
4    red   5      20
5  green   6      34
  • Related