Home > Software engineering >  Remap values in pandas in long dataframe based on dictionary lookup
Remap values in pandas in long dataframe based on dictionary lookup

Time:11-26

Given data:

df = pd.DataFrame({"var": ["a", "a", "b", "c"], "value": [1, 2, 1, 1]})
value_map = {
    "a": {
        1: "hello",
        2: "something",
    },
    "c": {1: "another"},
}

I'd like to create the following:

| var   |   value | labelled   |
|:------|--------:|:-----------|
| a     |       1 | hello      |
| a     |       2 | something  |
| b     |       1 | 1          |
| c     |       1 | another    |

This can be done using:

for value in value_map:
    df.loc[df["var"].eq(value), "labelled"] = df.loc[
        df["var"].eq(value), "value"
    ].replace(value_map[value])

df["labelled"] = np.where(df["labelled"].isna(), df["value"], df["labelled"])

This doesn't feel very idiomatic to me though, so I'm wondering what would be a better approach using typical pandas / numpy.

CodePudding user response:

You can transform your dictionary to dataframe and merge:

s = pd.DataFrame(value_map).unstack()

df.merge(s.rename('labelled'),
         left_on=['var', 'value'], right_index=True,
         how='left')

output:

  var  value   labelled
0   a      1      hello
1   a      2  something
2   b      1        NaN
3   c      1    another

Assuming you want to replace the NaNs by the values of the previous column, you can also ffill on axis=1:

  var value   labelled
0   a     1      hello
1   a     2  something
2   b     1          1
3   c     1    another

CodePudding user response:

Create Series with MultiIndex, so possible use DataFrame.join:

s = pd.Series({(k1, k2):v2 for k1, v1 in value_map.items() for k2, v2 in v1.items()})
print (s)
a  1        hello
   2    something
c  1      another
dtype: object

df = df.join(s.rename('labelled'), on=['var','value'])
print (df)
  var  value   labelled
0   a      1      hello
1   a      2  something
2   b      1        NaN
3   c      1    another

Or modification of @mozway solution:

df = df.join(pd.DataFrame(value_map).unstack().rename('labelled'), on=['var','value'])

Last replace NaNs:

df["labelled"] = df["labelled"].fillna(df["value"])
  • Related