Hi I'm wondering what is the fastest, most easy way to AddOrUpdate data in a Pandas DataFrame
import pandas as pd
# Original DataFrame
pd.DataFrame([
{'A':'a1','B':'b1','C':'c1'},
{'A':'a3','B':'b2','C':'c2'},
{'A':'a3','B':'b3','C':'c3'},
])
Original DataFrame :
A B C
0 a1 b1 c1
1 a3 b2 c2
2 a3 b3 c3
# A List of changes
changes = [
{'id':0, 'A':'aNEW','C':'cNEW'},
{'id':2, 'B':'bNEW'},
{'id':3, 'A':'aNEW','C':'cNEW'}},
]
# HOW TO ?
df.UpdateOrAdd(changes)
Resulting DataFrame :
A B C
0 aNEW b1 cNEW
1 a3 b2 c2
2 a3 bNEW c3
3 aNEW None cNEW
AddOrUpdate a Pandas DataFrame with a list of changes
CodePudding user response:
You can use craft a DataFrame
from the dictionary, then align the indices with reindex
and combine_first
:
df2 = pd.DataFrame(changes).set_index('id')
out = (df2.reindex(df.index.union(df2.index))
.combine_first(df)
)
Output:
A B C
0 aNEW b1 cNEW
1 a3 b2 c2
2 a3 bNEW c3
3 aNEW NaN cNEW
As a method
If you really want, you can add this as a DataFrame method using monkey patching:
def AddOrUpdate(self, other):
if not isinstance(other, pd.DataFrame):
other = pd.DataFrame(other)
other = other.set_index('id')
return (other.reindex(self.index.union(other.index))
.combine_first(df)
)
pd.DataFrame.AddOrUpdate = AddOrUpdate
out = df.AddOrUpdate(changes)
CodePudding user response:
If you have DataFrame index as integers starting with 0 and having continous values, you can just use .loc
and add a new row creating new index at next row based on row count:
df.loc[df.shape[0]] = ['aNEW', None, 'cNEW']
#df
A B C
0 a1 b1 c1
1 a3 b2 c2
2 a3 b3 c3
3 aNEW None cNEW
You can pass dictionary too, you don't need to include key/value pair for None
if you don't care whether it is None
or NaN
:
df.loc[df.shape[0]] = {'A': 'aNew ', 'C': 'cNew '}
#df
A B C
0 a1 b1 c1
1 a3 b2 c2
2 a3 b3 c3
3 aNew NaN cNew