Home > Back-end >  Detect special values in one column
Detect special values in one column

Time:10-03

I am trying to detect values with some specific characters e.g(?,/ etc). Below you can see a small sample with some data.

import pandas as pd
import numpy as np
data = {
         'artificial_number':['000100000','000010000','00001000/1','00001000?','0?00/10000'],
        }
df1 = pd.DataFrame(data, columns = [
                                      'artificial_number',])

Now I want to detect values with specific characters that are not numbers ('00001000/1','00001000?','0?00/10000') I tried with this lines below

import re

clean = re.sub(r'[^a-zA-Z0-9\._-]', '', df1['artificial_number'])

But this code is not working as I expected. So can anybody help me how to solve this problem ?

CodePudding user response:

#replace the non-digit with an empty value
df1['artificial_number'].str.replace(r'([^\d])','', regex=True)
0    000100000
1    000010000
2    000010001
3     00001000
4     00010000
Name: artificial_number, dtype: object

if you like to list the column with non-digit values

df1.loc[df1['artificial_number'].str.extract(r'([^\d])')[0].notna()]

artificial_number
2   00001000/1
3   00001000?
4   0?00/10000

CodePudding user response:

Assuming a number in your case is an integer, to find the values that have non-numbers, just count the number of numbers, and compare with length of string:

rows = [len(re.findall('[0-9]', s)) != len(s) for s in  df1.artificial_number]

df1.loc[rows] 
#  artificial_number
#2        00001000/1
#3         00001000?
#4        0?00/10000

CodePudding user response:

To detect which of the values aren't interpretable as numeric, you can also use str.isnumeric:

df1.loc[~df1.artificial_number.str.isnumeric()]

  artificial_number
2        00001000/1
3         00001000?
4        0?00/10000

If all characters need to be digits (e.g. 10.0 should also be excluded), use str.isdigit:

df1.loc[~df1.artificial_number.str.isdigit()]

df1.iloc[0,0] = '000100000.0'

  artificial_number
0       000100000.0
2        00001000/1
3         00001000?
4        0?00/10000
  • Related