Home > other >  Append Only Unique Rows from Second Dataframe
Append Only Unique Rows from Second Dataframe

Time:10-25

Given 2 dataframes, how can I append only the unique rows to the main df from the second df?

Example, given these two dataframes:

Input Dataframes

...how can I end up with this result?:

Desired Result

I would like to involve the index somehow as my application will be using datetimeindex's. A reproducible code, and my attempt at concatenation is below:

import pandas as pd 

df1 = pd.DataFrame(
    {
        "A": ["A0", "A1", "A2", "A3"],
        "B": ["B0", "B1", "B2", "B3"],
        "C": ["C0", "C1", "C2", "C3"],
        "D": ["D0", "D1", "D2", "D3"],
    },
    index=[0, 1, 2, 3],
)

print(df1)
print()


df2 = pd.DataFrame(
    {
        "A": ["A2", "A3", "A4", "A5"],
        "B": ["B2", "B3", "B4", "B5"],
        "C": ["C2", "C3", "C4", "C5"],
        "D": ["D2", "D3", "D4", "D5"],
    },
    index=[2, 3, 4, 5],
)

print(df2)
print()

result = pd.concat([df1, df2], join="inner", ignore_index=False)

print(result)

CodePudding user response:

Just do merge in your case

out = df1.merge(df2,how='outer')
    A   B   C   D
0  A0  B0  C0  D0
1  A1  B1  C1  D1
2  A2  B2  C2  D2
3  A3  B3  C3  D3
4  A4  B4  C4  D4
5  A5  B5  C5  D5

CodePudding user response:

After concatenation, you can drop the duplicates using drop_duplicate() function.

import pandas as pd 

df1 = pd.DataFrame(
    {
        "A": ["A0", "A1", "A2", "A3"],
        "B": ["B0", "B1", "B2", "B3"],
        "C": ["C0", "C1", "C2", "C3"],
        "D": ["D0", "D1", "D2", "D3"],
    },
    index=[0, 1, 2, 3],
)

print(df1)
print()


df2 = pd.DataFrame(
    {
        "A": ["A2", "A3", "A4", "A5"],
        "B": ["B2", "B3", "B4", "B5"],
        "C": ["C2", "C3", "C4", "C5"],
        "D": ["D2", "D3", "D4", "D5"],
    },
    index=[2, 3, 4, 5],
)

print(df2)
print()

result = pd.concat([df1, df2], join="inner", ignore_index=False)

result = result.drop_duplicates()

print(result)

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.drop_duplicates.html

  • Related