Given data:
df = pd.DataFrame({"var": ["a", "a", "b", "c"], "value": [1, 2, 1, 1]})
value_map = {
"a": {
1: "hello",
2: "something",
},
"c": {1: "another"},
}
I'd like to create the following:
| var | value | labelled |
|:------|--------:|:-----------|
| a | 1 | hello |
| a | 2 | something |
| b | 1 | 1 |
| c | 1 | another |
This can be done using:
for value in value_map:
df.loc[df["var"].eq(value), "labelled"] = df.loc[
df["var"].eq(value), "value"
].replace(value_map[value])
df["labelled"] = np.where(df["labelled"].isna(), df["value"], df["labelled"])
This doesn't feel very idiomatic to me though, so I'm wondering what would be a better approach using typical pandas / numpy.
CodePudding user response:
You can transform your dictionary to dataframe and merge
:
s = pd.DataFrame(value_map).unstack()
df.merge(s.rename('labelled'),
left_on=['var', 'value'], right_index=True,
how='left')
output:
var value labelled
0 a 1 hello
1 a 2 something
2 b 1 NaN
3 c 1 another
Assuming you want to replace the NaNs by the values of the previous column, you can also ffill
on axis=1
:
var value labelled
0 a 1 hello
1 a 2 something
2 b 1 1
3 c 1 another
CodePudding user response:
Create Series with MultiIndex
, so possible use DataFrame.join
:
s = pd.Series({(k1, k2):v2 for k1, v1 in value_map.items() for k2, v2 in v1.items()})
print (s)
a 1 hello
2 something
c 1 another
dtype: object
df = df.join(s.rename('labelled'), on=['var','value'])
print (df)
var value labelled
0 a 1 hello
1 a 2 something
2 b 1 NaN
3 c 1 another
Or modification of @mozway solution:
df = df.join(pd.DataFrame(value_map).unstack().rename('labelled'), on=['var','value'])
Last replace NaN
s:
df["labelled"] = df["labelled"].fillna(df["value"])