How Can I Solve This No Duplicated 2 Column Calculation?-CodePudding

Hello StackOverflow People! I have some trouble here, I do some research but I still can't make it. I have two columns that are substracted from a Dataset, the columns are "# Externo" and "Nro Envio ML".

I want that the result of the code gives me only the numbers that exist in "# Externo" but no in "Nro Envio ML"

For Example:

If 41765931626 is only in "# Externo" column but no in "Nro Envio ML", I want to print that number. Also if no exist any number in "# Externo" that is not on "Nro Envio ML" I want to print some text print("No strange sales")

Here its the code I tried. Sorry for my bad english

    import numpy as np
df2=df2.dropna(subset=['Unnamed: 13'])
df2 = df2[df2['Unnamed: 13'] != 'Nro. Envío']
df2['Nro Envio ML']=df2['Unnamed: 13']

dfn=df2[["# Externo","Nro Envio ML"]]

dfn1 = dfn[dfn['# Externo'] != dfn['Nro Envio ML']]
dfn1

Also with diff It gives me values that are on 'Nro Envio ML'

Link for Sample: https://github.com/francoveracallorda/sample

CodePudding user response：

I would go outside of pandas and use the python built in set and compute the difference. Here is a simplified example:

import pandas as pd

df = pd.DataFrame({
    "# Externo": [3, 5, 4, 2, 1, 7, 8],
    "Nro Envio ML": [4, 9, 0, 2, 1, 3, 5]
})

diff = set(df["# Externo"]) - set(df["Nro Envio ML"])
# diff contains the values that are in df["# Externo"] but not in df["Nro Envio ML"].

print(f"Weird sales: {diff}" if diff else "No strange sales")
# Output:
# Weird sales: {8, 7}

PS: If you want to stay inside pandas, you can use diff = df.loc[~df["# Externo"].isin(df["Nro Envio ML"]), "# Externo"] to compute the safe difference as a pd.Series.

CodePudding user response：

You can use ~ and isin of pandas.

series1 = pd.Series([2, 4, 8, 20, 10, 47, 99])
series2= pd.Series([1, 3, 6, 4, 10, 99, 50])
series3 = pd.Series([2, 4, 8, 20, 10, 47, 99])
df = pd.concat([series1, series2,series3], axis=1)

Case 1: Number in series1 but not in series2

diff = series1[~series1.isin(series2)]

Case 2: No any number in series1 and not in series2

same = series1[~series1.isin(series3)]