I am trying to map a column of my df with a dictionary. My dictionary contains tuple as key and I want to update an existing column value based on the key. How can I achieve that ?
sample df
column1 column2 column3 column4 column5
None 123 test 999 42
sample dict
{(123, "test", 999):1}
final df
column1 column2 column3 column4 column5
1 123 test 999 42
CodePudding user response:
Create Series with MultiIndex
by keys, convert columns in same order like keys by DataFrame.set_index
and then use Series.update
:
d = {(123, "test", 999):1}
s = pd.Series(d)
s1 = df.set_index(['column2','column3','column4'])['column1']
s1.update(s)
df = s1.reset_index().reindex(df.columns, axis=1)
print (df)
column1 column2 column3 column4
0 1 123 test 999
EDIT: If there is multple columns:
print (df)
column1 column2 column3 column4 column5
0 None 123 test 999 42
1 None 123 test 99 42
d = {(123, "test", 999):1}
s = pd.Series(d)
s1 = df.set_index(['column2','column3','column4'])['column1']
s1.update(s)
df['column1'] = s1.reset_index()['column1']
print (df)
column1 column2 column3 column4 column5
0 1.0 123 test 999 42
1 None 123 test 99 42
EDIT1: If set name
in Series s
, is possible only set MultiIndex:
d = {(123, "test", 999):1}
s = pd.Series(d, name='column1')
df1 = df.set_index(['column2','column3','column4'])
df1.update(s)
df = df1.reset_index().reindex(df.columns, axis=1)
print (df)
column1 column2 column3 column4 column5
0 1.0 123 test 999 42
1 None 123 test 99 42
This solution overwrite existing values in col1
, if need replace missing values NaN
or None
s use:
#because 7 in first column not overwritten value
print (df)
column1 column2 column3 column4
0 7 123 test 999
d = {(123, "test", 999):1}
s = pd.Series(d)
df1 = (df.set_index(['column2','column3','column4'])['column1']
.fillna(s)
.reset_index()
.reindex(df.columns, axis=1))
print (df1)
column1 column2 column3 column4
0 7 123 test 999
First solution set 1
:
s1 = df.set_index(['column2','column3','column4'])['column1']
s1.update(s)
df2 = s1.reset_index().reindex(df.columns, axis=1)
print (df2)
column1 column2 column3 column4
0 1 123 test 999