Home > Software engineering >  Replace columns in a Dataframe using dictionary given condition
Replace columns in a Dataframe using dictionary given condition

Time:10-22

I have the following dataframe:

data = {'name':['sam','rye','lori','chris','sara'],
        'ha':[0.020,1,0.05,0.7,0.001],
        'he':[1,1,0.1,0.0001,1],
        'hi':[0.001,0.002,0.0021,0.3,0.005],
        'ho':[0.0002,0.0043,0.0067,0.0123,0.0110],
        'hu':[0.7500,0.0540,0.0030,1,0.0081],
        'hm':[0.002,0.0021,0.3,0.005,1]}

df = pd.DataFrame(data)
df.set_index('name')
         ha      he      hi      ho     hu      hm
name                        
sam     0.020   1.0000  0.0010  0.0002  0.7500  0.0020
rye     1.000   1.0000  0.0020  0.0043  0.0540  0.0021
lori    0.050   0.1000  0.0021  0.0067  0.0030  0.3000
chris   0.700   0.0001  0.3000  0.0123  1.0000  0.0050
sara    0.001   1.0000  0.0050  0.0110  0.0081  1.0000

I have this dictionary

dict1 = {'ha': { 'sam' : 0.020, 'rye' : -0.018, 'lori': 0.05, 'chris': 0.7, 'sara' : 0.001},
         'he': { 'sam' : 0.00005, 'rye' : 0, 'lori': 1, 'chris': -2, 'jesse' : 5}}

I would like to use this dictionary to replace the values in the row given the following condition. For every row if the column value is larger than dictionary value, replace using the dictionary, otherwise retain current value.

This is what I'm done do far, but its failing. I'm trying to do this with a loop.

row = 0
for item in range(0,len(df)):
    row = row   1
    for i in dict1:
        if df.at[row, 'ha'] >= dict1[i]:
            df.at[row, 'ha'] = dict1[i]

CodePudding user response:

  1. Use your dict1 to make a new DataFrame with the same index & columns as df:
otherdf = pd.DataFrame(dict1).reindex(index=df.index, columns=df.columns)
  1. Replace the values where df is greater than otherdf:
df[df > otherdf] = otherdf

df is now:

          ha       he      hi      ho      hu      hm
name                                                 
sam    0.020  0.00005  0.0010  0.0002  0.7500  0.0020
rye   -0.018  0.00000  0.0020  0.0043  0.0540  0.0021
lori   0.050  0.10000  0.0021  0.0067  0.0030  0.3000
chris  0.700 -2.00000  0.3000  0.0123  1.0000  0.0050
sara   0.001  1.00000  0.0050  0.0110  0.0081  1.0000

In the above, otherdf has the same shape as the original data, but contains the values from dict1. So it can be used to make a boolean comparison:

>>> otherdf
          ha       he  hi  ho  hu  hm
name                                 
sam    0.020  0.00005 NaN NaN NaN NaN
rye   -0.018  0.00000 NaN NaN NaN NaN
lori   0.050  1.00000 NaN NaN NaN NaN
chris  0.700 -2.00000 NaN NaN NaN NaN
sara   0.001      NaN NaN NaN NaN NaN

>>> df > otherdf
          ha     he     hi     ho     hu     hm
name                                           
sam    False   True  False  False  False  False
rye     True   True  False  False  False  False
lori   False  False  False  False  False  False
chris  False   True  False  False  False  False
sara   False  False  False  False  False  False

CodePudding user response:

Iterates through dictionaries and then replace values. Set your index'name' accurately .

import pandas as pd
data = {'name':['sam','rye','lori','chris','sara'],
    'ha':[0.020,1,0.05,0.7,0.001],
    'he':[1,1,0.1,0.0001,1],
    'hi':[0.001,0.002,0.0021,0.3,0.005],
    'ho':[0.0002,0.0043,0.0067,0.0123,0.0110],
    'hu':[0.7500,0.0540,0.0030,1,0.0081],
    'hm':[0.002,0.0021,0.3,0.005,1]}

df = pd.DataFrame(data,index=data['name'])
#df.set_index('name')


for i,j in dict1.items():
   #
for m,n in j.items():
    #print(m,n)      
    if m  in df.index.tolist(): 
       #
        if df.loc[m,i] >= n:
            df.loc[m,i]=n
  • Related