Assuming I have a DF as follows:
df = pd.DataFrame({'legs': [2, 4, 8, 0],
'wings': [2, 0, 0, 0],
'specimen': [10, 2, 1, 8]},
index=['falcon', 'dog', 'spider', 'fish'])
df
resulting in:
Now I get data in the form of a dict and I would like to add it
new_data = {'dog':{'wings':45,'specimen':89},'fish':{'wings':55555,'something_new':'new value'}, 'new_row':{'wings':90}}
new_data_df = pd.DataFrame(new_data).T
new_data_df
I can use append to add the data to the first DF, but append will be deprecated, so I rather stay away. I can use concat as in here:
I dont want row index to be duplicated. I would like that the data is overwriting and added when a new column or row appears in the dict. There should be one and only one dog index column. As you see in the above screenshot the row dog appears two times.
changing ignore_index=False to True does not help, the index simple is skipped.
CodePudding user response:
You may check with combine_first
out = new_data_df.combine_first(df)
Out[144]:
legs something_new specimen wings
dog 4.0 NaN 89.0 45.0
falcon 2.0 NaN 10.0 2.0
fish 0.0 new value 8.0 55555
new_row NaN NaN NaN 90.0
spider 8.0 NaN 1.0 0.0
CodePudding user response:
Another option in case you want to keep values from both rows:
df = pd.DataFrame({'legs': [2, 4, 8, 0],
'wings': [2, 0, 0, 0],
'specimen': [10, 2, 1, 8]},
index=['falcon', 'dog', 'spider', 'fish'])
new_data = {'dog':{'wings':45,'specimen':89},'fish':{'wings':55555,'something_new':'new value'}, 'new_row':{'wings':90}}
new_data_df = pd.DataFrame(new_data).T
output = pd.concat([df, new_data_df], ignore_index=False).reset_index()
output1 = output.groupby('index').agg(list)
print(output1)
legs wings specimen something_new
index
dog [4.0, nan] [0, 45.0] [2, 89.0] [nan, nan]
falcon [2.0] [2] [10] [nan]
fish [0.0, nan] [0, 55555] [8, nan] [nan, new value]
new_row [nan] [90.0] [nan] [nan]
spider [8.0] [0] [1] [nan]