Home > front end >  Finding if a row in one DataFrame is in Another Dataframe
Finding if a row in one DataFrame is in Another Dataframe

Time:01-31

I wish to find if the last entry in an account history table is in a working account obtained from a bank. I have to see that there is at least one record that overlaps so I know there is no data missing.

Here is my solution:(UPDATE: Sample data added)

import pandas as pd

dfWorking = pd.DataFrame({
    'Date': ['2023-01-25', '2023-01-24', '2023-01-24', '2023-01-23'],
    'Item': ['Visa Purchase 21Jan Hanaro Northlakes    Mango Hil    ',
             'Visa Purchase 21Jan Event Cinemas North  North Lak',
             'Visa Purchase 21Jan Event Cinemas North  North Lak',
             'Mcare Benefits 4880000027 Eywq'],
    'Debit': [67.0, 10.2, 7.65, 39.75],
    'Credit': [0.0, 0, 0, 0],
    'Balance': [1830.0, 1897.99, 1908.019, 1915.84],
    'Description': ['a', 'b', 'c', 'd']
})

dfAcctHist = pd.DataFrame({
    'Date': ['2023-01-23', '2023-01-23', '2023-01-23', '2023-01-23'],
    'Item': ['Csc R555558Df Nett',
             'Tfr Wdl BPAY Internet 25Jan05:31 208655638973732700058Deft Payments',
             'Eftpos Debit 25Jan15:19 Sq *Becs Cafe Kippa-Ring   Qldau',
             'Mcare Benefits 4880000027 Eywq'],
    'Debit': [0, 168.0, 9.0, 39.75],
    'Credit': [907.92, 0, 0, 0],
    'Balance': [2053.09, 1885.07, 1876.09, 1915.84],
    'Description': ['z', 'x', 's', 'f']})
#
#see if the last entry in the account history data is in the Working data. 
# If it isn't then there maybe some data missing from the Working data.
# (The data from both tables need to partially overlap)
#
filter1 = dfWorking["Item"].isin([dfAcctHist.iloc[-1]['Item']])
filter2 = dfWorking["Balance"].isin([dfAcctHist.iloc[-1]['Balance']])
filter3 = dfWorking["Date"].isin([dfAcctHist.iloc[-1]['Date']])
dfCheck = dfWorking[filter1 & filter2 & filter3]

# print(f"\n\nDate filter = {filter3}")
print(f"\nis the last entry in file in new data:\n {dfCheck}")
print(f"\n\ncount: {dfCheck['Date'].count()}")

if (dfCheck['Date'].count) != 0:
    print("data is overlapping")
else:
    print("No  data overlap")

My question is this. Can someone offer up a more "Pythonise" way of solving this instead of using the 3 filter dfs please? I have tried different variations of isin and iloc with no joy.

CodePudding user response:

last_entry = dfAcctHist.iloc[-1]
dfCheck = dfWorking[(dfWorking['Item'] == last_entry['Item']) &
                    (dfWorking['Balance'] == last_entry['Balance']) &
                    (dfWorking['Date'] == last_entry['Date'])]

if dfCheck.empty:
    print("No data overlap")
else:
    print("Data is overlapping")

May not be what you were looking for although this seems slightly more pythonic

CodePudding user response:

np.where can be used for more pythonic way to check overlap

import numpy as np
#
# Your Code here
#
df=pd.merge(dfAcctHist,dfWorking.iloc[-1],on=['Date','Item','Balance'],how='left',indicator='exists')
overlap=any(np.where(df.exists=='both',True,False))
print("Overlap:",overlap)
  • Related