I have a problem cross-checking numbers between a list and a column.
I have a list called "allowed_numbers" with 40 different phone numbers and a column imported from an excel sheet with 8000 calls called df['B-NUMBER']. I believe around 90% of these 8000 calls are in the allowed_number list but I need to cross-check this somehow and be able to see what numbers that isn't in the list. preferably store these numbers in a variable called "fraud"
So I made the allowed_numbers to a list with strings inside, it looks like this.
'21114169202',
'27518725605',
'514140099453',
'5144123173905',
allowed_number=re.sub(",","", allowed_number)
allowed_number = allowed_number.split(" ")
Then I tried to cross-check this with the column df['B-NUMBER'] in different ways but nothing works and need help. I've tried this
df[df['B-NUMBER'].isin(allowed_number)]
fraud = [df['B-NUMBER'] in allowed_number if allowed_number not in df["B-NUMBER"]]
fraud = df['B-NUMBER'].apply(lambda x: ''.join(y for y in x if y not in allowed_number))
I try to avoid loops because of the run time but if it is possible with a loop somehow please share your insight :) cheers
CodePudding user response:
Just to summarize the discussion in the comments. Using
df['B-NUMBER'].isin(allowed_number)
works once the content of allowed_number
is turned into integers via
allowed_number = [int(x) for x in allowed_number]
So to get the fraudulent numbers something like this works
allowed_number=re.sub(",","", allowed_number)
allowed_number = allowed_number.split(" ")
allowed_number = [int(x) for x in allowed_number]
df["allowed"] = df["B-NUMBER"].isin(allowed_number)
# fraudulent
df_fradulent = df.loc[~df["allowed"]]