I am trying to merge these two dataframe together and preserve all the rows and columns. They have different times under the column 'time', so i want them to merge in a way that is time sequential.
df1:
time | run_id | weight |
---|---|---|
0 | H1 | 500 |
24 | H1 | 400 |
48 | H1 | 300 |
0 | H2 | 900 |
24 | H2 | 800 |
48 | H2 | 700 |
df2:
time | run_id | totalizer |
---|---|---|
0.5 | H1 | 100 |
10 | H1 | 200 |
40 | H1 | 300 |
60 | H1 | 400 |
0.5 | H2 | 900 |
5 | H2 | 1000 |
35 | H2 | 1100 |
70 | H2 | 1200 |
How do I merge these two tables into:
time | run_id | weight | totalizer |
---|---|---|---|
0 | H1 | 500 | |
0.5 | H1 | 100 | |
10 | H1 | 200 | |
24 | H1 | 400 | |
40 | H1 | 300 | |
48 | H1 | 300 | |
60 | H1 | 400 | |
0 | H2 | 900 | |
0.5 | H2 | 900 | |
5 | H2 | 1000 | |
24 | H2 | 800 | |
35 | H2 | 1100 | |
48 | H2 | 700 | |
70 | H2 | 1200 |
I tried:
mergedf = df1.merge(df2, how='outer')
but it stacked df1 on top of df2.
CodePudding user response:
One option is to use combine_first
:
cols = ["run_id", "time"]
out = (
df1.set_index(cols)
.combine_first(df2.set_index(cols))
.reset_index().sort_values(by=cols)
[["time", "run_id", "weight", "totalizer"]]
)
Output :
print(out)
time run_id weight totalizer
0 0.0 H1 500.0 NaN
1 0.5 H1 NaN 100.0
2 10.0 H1 NaN 200.0
3 24.0 H1 400.0 NaN
4 40.0 H1 NaN 300.0
5 48.0 H1 300.0 NaN
6 60.0 H1 NaN 400.0
7 0.0 H2 900.0 NaN
8 0.5 H2 NaN 900.0
9 5.0 H2 NaN 1000.0
10 24.0 H2 800.0 NaN
11 35.0 H2 NaN 1100.0
12 48.0 H2 700.0 NaN
13 70.0 H2 NaN 1200.0
CodePudding user response:
You could simply add line after what you have already done:
mergedf = df1.merge(df2, how='outer') # your current code
mergedf.sort_values(['run_id', 'time']) # add this
Read more here: https://stackoverflow.com/a/17141755/2650341
CodePudding user response:
You can use panda's merge_ordered
df_merged=pd.merge_ordered(df1,df2, on=['run_id','time'])