I am trying to code a task for work, so I made a simple case that simulates this task. I have two dataframes: data_1
and data_2
, and I would like to replace some rows in data_1
with rows from data_2
by condition of matching values in a column named time
.
Here is an example:
import numpy as np
import pandas as pd
a = {
'time':[1,2,3,4,5,6],
'column_1':[2,2,2,2,2,2],
'column_2':[3,3,3,3,3,3]
}
b = {
'time':[3,4,5],
'column_1':[0,0,0],
'column_2':[0,0,0]
}
data_1 = pd.DataFrame(a)
data_2 = pd.DataFrame(b)
In the result, I would like to get dataframe like this:
time column_1 column_2
0 1 2 3
1 2 2 3
2 3 0 0
3 4 0 0
4 5 0 0
5 6 2 3
I tried merge and replace methods in Pandas, but it was not successful. I did mask boolean array:
time_1 = list(data_1['time'])
time_2 = list(data_2['time'])
mask_array = np.zeros(len(time_1),dtype = bool)
for i, item in enumerate(time_1):
if item in time_2:
mask_array[i] = True
and I received:
array([False, False, True, True, True, False])
But I could not replace data_1
values with data_2
values. What did I do wrong? It does not seem like a difficult task, but I could not find anything useful and just do not know what to do. I do not have a lot of experience with pandas, so maybe I do not understand something.
CodePudding user response:
You can use .update()
after setting index on time
on both data_1a
and data_1b
, as follows:
data_1a = data_1.set_index('time')
data_1a.update(data_2.set_index('time'))
data_out = data_1a.reset_index()
.update()
modifies in place using non-NA values from another DataFrame. Aligns on indices. Thus, when you set time
as index on both data_1a
and data_1b
, .update()
aligns on matching values in column time
to perform the update of data_1
by corresponding values of data_2
.
Data Setup:
a = {
'time':[1,2,3,4,5,6],
'column_1':[2,2,2,2,2,2],
'column_2':[3,3,3,3,3,3]
}
b = {
'time':[3,4,5],
'column_1':[0,0,0],
'column_2':[0,0,0]
}
data_1 = pd.DataFrame(a)
data_2 = pd.DataFrame(b)
Result:
print(data_out)
time column_1 column_2
0 1 2.0 3.0
1 2 2.0 3.0
2 3 0.0 0.0
3 4 0.0 0.0
4 5 0.0 0.0
5 6 2.0 3.0