Home > Back-end >  Check if two columns are having matching values, but values are not in the same index places(Python,
Check if two columns are having matching values, but values are not in the same index places(Python,

Time:12-05

So, I have this data frame about Super Store Sales. I have 2 sheets:

  • First is named "Orders"
  • Second one is named "Returns" In both sheets we have a matching column called "Order ID", but in the Return sheet we have less rows in "Order ID" of returned purchases and what I basically want to do is make a new column and check if Order ids are matching In Order sheet and in Return sheet and if they are matching I want a value "Returned" to be written and if values are not matching "Not returned". This is df_order data frame orders This is df_return
    returns This is how i thought it should be checked but it is definitely not correct cause everywhere says "not returned", but I've checked manually and seen that some orders are matching. Please, help me out.
excel_path = r'C:\Users\Korisnik\Desktop\PythonFiles\Omega\SuperStoreUS.xlsx'
df = pd.read_excel(excel_path, sheet_name=None)

# 1.
df_order = df.get('Orders')
df_returns = df.get('Returns')
df_users = df.get('Users')

df_n.reset_index(drop=True)
df_returns.reset_index(drop=True)
df_n['Status'] = np.where( df_n['Order ID'].equals(df_returns['Order ID'])  and df_returns["Status"] == "Returned", "Returned", "Not returned")
df_order= {'City':['Prior Lake','Chicago','NY','Prior Lake', 'Round Rock'],
           'Order ID':[86838 ,90154,15000,10000, 12447]}
df_return= {'Order ID':[90154, 86838 ],
           'Returned':['Returned', 'Returned']}

# Create DataFrame from dict
df_orders = pd.DataFrame.from_dict(df_order)
df_returns = pd.DataFrame.from_dict(df_return)

CodePudding user response:

You can use pandas.DataFrame.merge with pandas.Series.fillna :

df_order = pd.read_excel("SuperStoreUS.xlsx", sheet_name="Orders")
df_return = pd.read_excel("SuperStoreUS.xlsx", sheet_name="Returns")

Use either :

# --- To create a new dataframe
out = df_order.merge(df_return, on="Order ID", how="left")
out["Status"] = out["Status"].fillna("Not Returned")

Or:

# --- To update df_order
df_order = df_order.merge(df_return, on="Order ID", how="left")
df_order["Status"] = df_order["Status"].fillna("Not Returned")

CodePudding user response:

Another way is to create a new column in df_order, with values conditioned on the row's Order ID:

df_orders['Status'] = df_orders['Order ID'].map(lambda x: 'Returned' if x in df_returns['Order ID'].tolist() else 'Not returned')
  • Related