Home > Net >  Combine two data frames by matching index
Combine two data frames by matching index

Time:12-31

I have a data frame that has the form:

index predicted
   1     1
   2     1
   3     0
   4     0
   5     1

And another that has the form:

index actual
   2    1
   4    0

I want the data frame:

index predicted actual
   1     1       nan
   2     1       1
   3     0       nan
   4     0       0
   5     1       nan

I've tried pd.concat([df1,df2], on="index", how="left") and pd.merge(df1, df2, axis=1)

Both give the dataframe:

index predicted actual
   1     1       1
   2     1       0
   3     0       nan
   4     0       nan
   5     1       nan

How can I get the data frame I need. Also thanks in advance.

CodePudding user response:

You can use the pd.merge() setting the parameters left_index = True and right_index = True

import pandas as pd
df1 = pd.DataFrame({'predicted': [1,1,0,0,1]}, index = (1,2,3,4,5))
df2 = pd.DataFrame({'actual': [1,0]}, index = (2,4))

pd.merge(df1, df2, how = 'left', left_index=True, right_index=True)

This will merge the two dataframes on index and produce the intended result required.

index   predicted   actual
1       1           NaN
2       1           1.0
3       0           NaN
4       0           0.0
5       1           NaN

CodePudding user response:

If you make sure that your index column is actually the df.index, pd.concat should work:

import pandas as pd

left = pd.DataFrame({"predicted": [1, 1, 0, 0, 1]}, index=[1, 2, 3, 4, 5])
right = pd.DataFrame({"actual": [1, 0]}, index=[2, 4])

out = pd.concat([left, right], axis=1)
   predicted  actual
1          1     NaN
2          1     1.0
3          0     NaN
4          0     0.0
5          1     NaN

If they're just columns, such as the following:

left = left.reset_index(names="index")
right = right.reset_index(names="index")

then you can use:

left.merge(right, on="index", how="left")
   index  predicted  actual
0      1          1     NaN
1      2          1     1.0
2      3          0     NaN
3      4          0     0.0
4      5          1     NaN

CodePudding user response:

Create index as a temporary column then left join using that then set it as index.

predict_df = pd.DataFrame({'predicted': [1,1,0,0,1]}, index=range(1,6))
actual_df = pd.DataFrame({'actual': [1,0]}, index=[2,4])

pd.merge(
    left=predict_df.reset_index(),
    right=actual_df.reset_index(),
    how='left',
    on='index'
).set_index('index')
      predicted actual
index       
1     1         NaN
2     1         1.0
3     0         NaN
4     0         0.0
5     1         NaN
  • Related