Home > Net >  Pandas Drop rows if String has numeric (floats of 4 decimals/5 numbers)
Pandas Drop rows if String has numeric (floats of 4 decimals/5 numbers)

Time:12-08

I have a long list dataframe that has texts similar to the example provided in the image. dataframe is called gbs2. I want to gbs2.drop rows with numerical values that has 4 decimal places or 5 numbers ("1.2345" or "12345") Code used:

name = pd.Series(['Product A', 'Product B', 'Product C'])
Produced_Qty = pd.Series(['ABC 1.407', 'NOP 1.6345', 'XYZ 1.3144/1.7523 ect'])
Passed_Qty = pd.Series([499768, 756662, 301297])

gbs2 = pd.DataFrame({'Name':name,
                     'Produced Qty': Produced_Qty,
                     'Passed Qty':Passed_Qty})
gbs2

enter image description here

In this case, Product B and Product C should be dropped. My original case which I cannot post has longer texts where the interested number is in the middle, but has many words from the start and end of the string.

I could not find ways to identify numerical values in a string that has both alphabets and numbers.

CodePudding user response:

You can use \d .\d{4} to match a floating point number with 4 decimals and \d{5} for a number with 5 digits combined with str.contains, then invert the boolean with ~ and perform boolean indexing:

out = gbs2.loc[~gbs2['Produced Qty'].str.contains(r'\d .\d{4}|\d{5}')]

Output:

        Name Produced Qty  Passed Qty
0  Product A    ABC 1.407      499768

If you want to match exactly 4/5 digits and no more, you additionally have to ensure the pattern is not having extra digits around it, then use '(?<!\d)(?:\d .\d{4}|\d{5})(?!\d)'

regex demo

  • Related