I have a dataframe and dictionary like as shown below
ID,Name,value,total,
1,Ajay,2.00,35
1,Dan,3.00,65
2,Ajay,2,78
2,Rajini,0.0,98
3,Ajay,3.00,53
3,Rad,75.25,21
df1 = pd.read_clipboard(sep=',')
output = {'Ajay': {1: 'ABC', 2: 'DEF', 3: 'DUMMA', 4: 'CHUMMA'}, 'Dan': {0: 'KOREA', 1: 'AUS/NZ', 2: 'INDIA', 3: 'ASEAN'}}
I would like to do the below
a) Replace the values in value
column by matching Name
value to Name
key in nested dict.
For ex: ID=1
has Name as Ajay
and value as 2.00
.
Now, if we look at the dict, we Ajay
outer key and trying to find the matching key (which is 2
). So, we replace value 2.00
with DEF
.
Similarly, we do this for other Name which is Dan
.
I tried the below
df1.replace({"values": output},inplace=True) # doesn't work
for d in output.values():
print(d.key())
Is there any efficient and elegant way to do this sort of replacement for million rows dataframe?
I expect my output to be like as below
CodePudding user response:
Try this
# map value-Names to output and fill the missing values with original values
df1['value'] = pd.Series(df1.set_index(['value','Name']).index.map(pd.DataFrame(output).stack())).fillna(df1['value'])
print(df1)
ID Name value total
0 1 Ajay DEF 35
1 1 Dan ASEAN 65
2 2 Ajay DEF 78
3 2 Rajini 0.0 98
4 3 Ajay DUMMA 53
5 3 Rad 75.25 21
CodePudding user response:
Use DataFrame.join
with DataFrame.stack
for new column and then use Series.fillna
for replace not matched values by value
column:
df = df.join(pd.DataFrame(output).stack().rename('new'), on=['value','Name'])
df['value'] = df.pop('new').fillna(df['value'])
print (df)
ID Name value total
0 1 Ajay DEF 35
1 1 Dan ASEAN 65
2 2 Ajay DEF 78
3 2 Rajini 0.0 98
4 3 Ajay DUMMA 53
5 3 Rad 75.25 21
CodePudding user response:
You can try loop the dictionary then use df.mask
or np.where
or loc
with boolean indexing to replace value.
for k, v in output.items():
df['value'] = df['value'].mask(df['Name'].eq(k), df['value'].map(v))
#df['value'] = np.where(df['Name'].eq(k), df['value'].map(v), df['value'])
#df.loc[df['Name'].eq(k), 'value'] = df.loc[df['Name'].eq(k), 'value'].map(v)
print(df)
ID Name value total
0 1 Ajay DEF 35
1 1 Dan ASEAN 65
2 2 Ajay DEF 78
3 2 Rajini 0.0 98
4 3 Ajay DUMMA 53
5 3 Rad 75.25 21