Home > Back-end >  UpdateOrAdd() changes to Pandas DataFrame
UpdateOrAdd() changes to Pandas DataFrame

Time:11-18

Hi I'm wondering what is the fastest, most easy way to AddOrUpdate data in a Pandas DataFrame

import pandas as pd

# Original DataFrame
pd.DataFrame([
        {'A':'a1','B':'b1','C':'c1'},
        {'A':'a3','B':'b2','C':'c2'},
        {'A':'a3','B':'b3','C':'c3'},
    ])
    
Original DataFrame :
    A   B   C
0  a1  b1  c1
1  a3  b2  c2
2  a3  b3  c3


# A List of changes
changes = [
    {'id':0, 'A':'aNEW','C':'cNEW'},
    {'id':2, 'B':'bNEW'},
    {'id':3, 'A':'aNEW','C':'cNEW'}},
]



# HOW TO ?
df.UpdateOrAdd(changes)

Resulting DataFrame :
      A     B     C
0  aNEW    b1  cNEW
1    a3    b2    c2
2    a3  bNEW    c3
3  aNEW  None  cNEW

AddOrUpdate a Pandas DataFrame with a list of changes

CodePudding user response:

You can use craft a DataFrame from the dictionary, then align the indices with reindex and combine_first:

df2 = pd.DataFrame(changes).set_index('id')

out = (df2.reindex(df.index.union(df2.index))
          .combine_first(df)
      )

Output:

      A     B     C
0  aNEW    b1  cNEW
1    a3    b2    c2
2    a3  bNEW    c3
3  aNEW   NaN  cNEW
As a method

If you really want, you can add this as a DataFrame method using monkey patching:

def AddOrUpdate(self, other):
    if not isinstance(other, pd.DataFrame):
        other = pd.DataFrame(other)
    other = other.set_index('id')
    return (other.reindex(self.index.union(other.index))
                 .combine_first(df)
            )

pd.DataFrame.AddOrUpdate = AddOrUpdate

out = df.AddOrUpdate(changes)

CodePudding user response:

If you have DataFrame index as integers starting with 0 and having continous values, you can just use .loc and add a new row creating new index at next row based on row count:

df.loc[df.shape[0]] = ['aNEW',  None,  'cNEW']

#df
A     B     C
0    a1    b1    c1
1    a3    b2    c2
2    a3    b3    c3
3  aNEW  None  cNEW

You can pass dictionary too, you don't need to include key/value pair for None if you don't care whether it is None or NaN:

df.loc[df.shape[0]] = {'A': 'aNew ', 'C': 'cNew '}

#df
A    B      C
0     a1   b1     c1
1     a3   b2     c2
2     a3   b3     c3
3  aNew   NaN  cNew 
  • Related