Home > OS >  How to efficiently fill a column of a dataframe based on a dictionary
How to efficiently fill a column of a dataframe based on a dictionary

Time:07-24

I have a dataframe and dictionary like this

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'A': [1, 1, 1, 2, 2, 3, 3, 3, 3],
    'ignore_me': range(9),
    'fill_me': [np.nan] * 9
})

di = {
    1: ['a', 'b'],
    2: ['c', 'd'],
    3: ['e', 'f', 'g']
}

   A  ignore_me  fill_me
0  1          0      NaN
1  1          1      NaN
2  1          2      NaN
3  2          3      NaN
4  2          4      NaN
5  3          5      NaN
6  3          6      NaN
7  3          7      NaN
8  3          8      NaN

The entries in A of df correspond to the keys in di. I would now like to fill the column fill_me using the values of di, so my desired outcome looks like this:

   A  ignore_me fill_me
0  1          0       a
1  1          1       b
2  1          2     NaN
3  2          3       c
4  2          4       d
5  3          5       e
6  3          6       f
7  3          7       g
8  3          8     NaN

One way of achieving this is as follows:

df_list = []
for key, values in di.items():
    temp_df = df[df['A'] == key].reset_index(drop=True)
    fill_df = pd.DataFrame({'A': [key]* len(values), 'fill_me': values})
    df_list.append(temp_df.combine_first(fill_df))

final_df = pd.concat(df_list, ignore_index=True)

which gives me the desired outcome. However, it requires looping, a concat and also creates a new dataframe. Does anyone see a more straightforward way of implementing this? Ideally I could "just" fill df using a smart way of using fillna or map.

CodePudding user response:

You can do cumcount create the key

s = pd.Series(di).explode().reset_index()

s.columns = ['A','fill']
df['key'] = df.groupby('A').cumcount()
s['key'] = s.groupby('A').cumcount()

df.merge(s,how='left')
Out[463]: 
   A  ignore_me  fill_me  key fill
0  1          0      NaN    0    a
1  1          1      NaN    1    b
2  1          2      NaN    2  NaN
3  2          3      NaN    0    c
4  2          4      NaN    1    d
5  3          5      NaN    0    e
6  3          6      NaN    1    f
7  3          7      NaN    2    g
8  3          8      NaN    3  NaN

CodePudding user response:

Use from this:

def f(x):
    return di.get(x).pop(0)

df['fill_me']= df.A.apply(lambda x: f(x), axis=1)

CodePudding user response:


def fill(x):
    global di

    try:
        res= di[x].pop(0)
    except:
        res= np.nan

    return res

df['fill_me']= df['A'].map(fill)

  • Related