Home > database >  How can I map tuple key with df values updating an existing column?
How can I map tuple key with df values updating an existing column?

Time:08-24

I am trying to map a column of my df with a dictionary. My dictionary contains tuple as key and I want to update an existing column value based on the key. How can I achieve that ?

sample df

column1 column2 column3 column4 column5
 None     123     test    999     42

sample dict

{(123, "test", 999):1}

final df

  column1 column2 column3 column4 column5
     1     123     test    999      42

CodePudding user response:

Create Series with MultiIndex by keys, convert columns in same order like keys by DataFrame.set_index and then use Series.update:

d = {(123, "test", 999):1}
s = pd.Series(d)

s1 = df.set_index(['column2','column3','column4'])['column1']
s1.update(s)

df = s1.reset_index().reindex(df.columns, axis=1)
print (df)
  column1  column2 column3  column4
0       1      123    test      999

EDIT: If there is multple columns:

print (df)
  column1  column2 column3  column4  column5
0    None      123    test      999       42
1    None      123    test       99       42


d = {(123, "test", 999):1}
s = pd.Series(d)

s1 = df.set_index(['column2','column3','column4'])['column1']
s1.update(s)

df['column1'] = s1.reset_index()['column1']
print (df)
  column1  column2 column3  column4  column5
0     1.0      123    test      999       42
1    None      123    test       99       42

EDIT1: If set name in Series s, is possible only set MultiIndex:

d = {(123, "test", 999):1}
s = pd.Series(d, name='column1')

df1 = df.set_index(['column2','column3','column4'])
df1.update(s)

df = df1.reset_index().reindex(df.columns, axis=1)
print (df)
  column1  column2 column3  column4  column5
0     1.0      123    test      999       42
1    None      123    test       99       42

This solution overwrite existing values in col1, if need replace missing values NaN or Nones use:

#because 7 in first column not overwritten value
print (df)
   column1  column2 column3  column4
0        7      123    test      999

d = {(123, "test", 999):1}
s = pd.Series(d)

df1 = (df.set_index(['column2','column3','column4'])['column1']
         .fillna(s)
         .reset_index()
         .reindex(df.columns, axis=1))
print (df1)
   column1  column2 column3  column4
0        7      123    test      999

First solution set 1:

s1 = df.set_index(['column2','column3','column4'])['column1']
s1.update(s)

df2 = s1.reset_index().reindex(df.columns, axis=1)
print (df2)
   column1  column2 column3  column4
0        1      123    test      999
  • Related