I want to merge or replace my data-1 of some rows based on my 'TIMESTEP' values in data-2.
I have tried both merge and replace options. On using replace, I am getting
"AttributeError: 'Series' object has no attribute '_replace_columnwise'"
And on using merge methods I am not getting the desired results. I have tried this code:
x = data1.loc[data1['TIMESTEP'] == 500000]
y = data2.loc[df_nd_merged['TIMESTEP'] == 500000]
data1.replace(x, y, inplace = True)
My data1:
TIMESTEP id mass y
2900 313 0.795699 -0.0149792
3000 462 0.782639 -0.0106693
3200 245 0.723805 -0.0167327
.....
.....
.....
TIMESTEP id mass y
2487960 9999300 9934 0.802645 -0.0827138
2488086 9999800 3849 0.833692 -0.0247934
2488130 10000000 9532 0.767143 -0.0855542
My data2:
TIMESTEP id mass y
500000 0 0 0
1000000 0 0 0
1500000 6272.0 0.877105 -0.0493408
1500000 7922.0 0.740305 -0.0443259
1500000 5699.0 0.772714 -0.0562289
2000000 0 0 0
2500000 0 0 0
3000000 0 0 0
3500000 0 0 0
4000000 0 0 0
4500000 0 0 0
5000000 0 0 0
5500000 0 0 0
6000000 1974.0 0.772978 -0.0270121
6500000 0 0 0
7000000 0 0 0
7500000 4870.0 0.92602 -0.0160799
8000000 6152.0 0.665494 -0.00542989
8500000 3300.0 0.67739 -0.0329551
9000000 30.0 0.798618 -0.0178684
9500000 0 0 0
Please help.
CodePudding user response:
IIUC, merge df with df2 on timestep and take the 'y' from df2
df[['TIMESTEP','id', 'mass']].merge(df2[['y']],
left_on=df['TIMESTEP'],
right_on=df2['TIMESTEP'],
how='left').drop(columns='key_0')
DF
TIMESTEP id mass y
0 2900 313 0.795699 -0.014979
1 3000 462 0.782639 -0.010669
2 3200 245 0.723805 -0.016733
df2
TIMESTEP id mass y
0 5000 0.0 0.000000 0.000000
1 1000 0.0 0.000000 0.000000
2 3000 6272.0 0.877105 -0.049341
3 3200 7922.0 0.740305 -0.044326
4 1500000 5699.0 0.772714 -0.056229
MERGED
TIMESTEP id mass y
0 2900 313 0.795699 NaN
1 3000 462 0.782639 -0.049341
2 3200 245 0.723805 -0.044326
CodePudding user response:
pandas.DataFrame.replace()
works best for individual values and is not intended for row-wise replacements. It works like this:
import pandas as pd
import numpy as np
# create toy example
df1 = pd.DataFrame(np.random.rand(5, 3), columns=['a', 'b', 'c'])
df1.iat[0, 1] = 0
# replace zeros with -1
df1.replace(to_replace=0, value=-1)
This replaces all values of zero by minus one, without taking rows into account.
So you are correctly looking for pandas.DataFrame.merge()
, which merges on one or multiple columns that exist in both dataframes (you can specify the columns explicitly if they do not have the same names by the keyword left_on
and right_on
).
# create toy example
df1 = pd.DataFrame(np.random.rand(5, 3), columns=['a', 'b', 'c'])
df2 = pd.DataFrame(np.random.rand(2, 2), columns=['a', 'd'])
# force same values in a column
df1.loc[1, 'a'] = 0
df2.loc[0, 'a'] = 0
# slice
lg1 = df1['a'] > 0.5
df1[lg1].merge(df2)
or you can simply work like you would slice a matrix, which might be a little more intuitive:
# slice
lg1 = df1['a'] > 0.5
df1[lg1] = df1[lg1] *-1
Note that the slices must be of the exact same size then.