Home > Enterprise >  Merge pandas dataframe rows based on column value
Merge pandas dataframe rows based on column value

Time:06-13

I want to merge or replace my data-1 of some rows based on my 'TIMESTEP' values in data-2.

I have tried both merge and replace options. On using replace, I am getting

"AttributeError: 'Series' object has no attribute '_replace_columnwise'"

And on using merge methods I am not getting the desired results. I have tried this code:

x = data1.loc[data1['TIMESTEP'] == 500000]
y = data2.loc[df_nd_merged['TIMESTEP'] == 500000]
data1.replace(x, y, inplace = True)

My data1:

TIMESTEP    id  mass    y
2900    313 0.795699    -0.0149792
3000    462 0.782639    -0.0106693
3200    245 0.723805    -0.0167327
.....
.....
.....
TIMESTEP    id  mass    y
2487960 9999300 9934    0.802645    -0.0827138
2488086 9999800 3849    0.833692    -0.0247934
2488130 10000000 9532   0.767143    -0.0855542

My data2:

TIMESTEP    id  mass    y
500000  0   0   0
1000000 0   0   0
1500000 6272.0  0.877105    -0.0493408
1500000 7922.0  0.740305    -0.0443259
1500000 5699.0  0.772714    -0.0562289
2000000 0   0   0
2500000 0   0   0
3000000 0   0   0
3500000 0   0   0
4000000 0   0   0
4500000 0   0   0
5000000 0   0   0
5500000 0   0   0
6000000 1974.0  0.772978    -0.0270121
6500000 0   0   0
7000000 0   0   0
7500000 4870.0  0.92602 -0.0160799
8000000 6152.0  0.665494    -0.00542989
8500000 3300.0  0.67739 -0.0329551
9000000 30.0    0.798618    -0.0178684
9500000 0   0   0

Please help.

CodePudding user response:

IIUC, merge df with df2 on timestep and take the 'y' from df2

df[['TIMESTEP','id', 'mass']].merge(df2[['y']], 
                                    left_on=df['TIMESTEP'], 
                                    right_on=df2['TIMESTEP'],
                                    how='left').drop(columns='key_0')

DF

    TIMESTEP    id  mass    y
0   2900    313     0.795699    -0.014979
1   3000    462     0.782639    -0.010669
2   3200    245     0.723805    -0.016733

df2

    TIMESTEP    id  mass    y
0   5000    0.0     0.000000    0.000000
1   1000    0.0     0.000000    0.000000
2   3000    6272.0  0.877105    -0.049341
3   3200    7922.0  0.740305    -0.044326
4   1500000     5699.0  0.772714    -0.056229

MERGED

    TIMESTEP    id  mass    y
0   2900    313     0.795699    NaN
1   3000    462     0.782639    -0.049341
2   3200    245     0.723805    -0.044326

CodePudding user response:

pandas.DataFrame.replace() works best for individual values and is not intended for row-wise replacements. It works like this:

import pandas as pd
import numpy as np

# create toy example
df1 = pd.DataFrame(np.random.rand(5, 3), columns=['a', 'b', 'c'])
df1.iat[0, 1] = 0

# replace zeros with -1
df1.replace(to_replace=0, value=-1)

This replaces all values of zero by minus one, without taking rows into account.

So you are correctly looking for pandas.DataFrame.merge(), which merges on one or multiple columns that exist in both dataframes (you can specify the columns explicitly if they do not have the same names by the keyword left_on and right_on).

# create toy example
df1 = pd.DataFrame(np.random.rand(5, 3), columns=['a', 'b', 'c'])
df2 = pd.DataFrame(np.random.rand(2, 2), columns=['a', 'd'])
# force same values in a column
df1.loc[1, 'a'] = 0
df2.loc[0, 'a'] = 0

# slice 
lg1 = df1['a'] > 0.5
df1[lg1].merge(df2)

or you can simply work like you would slice a matrix, which might be a little more intuitive:

# slice 
lg1 = df1['a'] > 0.5
df1[lg1] = df1[lg1] *-1

Note that the slices must be of the exact same size then.

  • Related