Home > Mobile >  python extract text from a column and do digit calculation
python extract text from a column and do digit calculation

Time:10-30

I'm trying to extract a string of zero numbers from a column and convert/calculate zero count to another column by regex, but it shows error. How can I solve this? Thank you!

My code:

import pandas as pd
df = pd.DataFrame(columns=['count'],
  data=[['ABC0000000000000000000000000000000000000000000000000000000000000000'],
        ['ABC0000000000000000000000000000000000101000000000000000000000000000'], 
        ['ABC0000000000000000010010000000000000010000000010000000100000000000'],
        ['ABC0000110000000000000000000000000000000000000000000000000000000000'],
        ['N/A']])

def conv(x):
   m = re.search(r'ABC(?P<COUNT>\S )', x)
   cnt = 0
   if m != None:
       for d in m.group('COUNT'):
           if d == '0':
               cnt  = 1
   return cnt

df['count_conv'] = df['count'].apply(conv)

My error code:

TypeError: expected string or bytes-like object

My expected output:

enter image description here

CodePudding user response:

try this,

df['count_conv'] = df['count'].str.count('0')

0    64
1    62
2    59
3    62
4     0
Name: count_conv, dtype: int64

CodePudding user response:

def conv(x):
    return x.count('0')

CodePudding user response:

Assuming you only need to update values for the rows where count columns starts with ABC, you can use

df['count_conv'] = df[df['count'].str.contains(r'^ABC', regex=True)]['count'].str.count('0')

The .str.contains(r'^ABC', regex=True) part detects the rows in the count column that start with ABC.

The df[df['count'].str.contains(r'^ABC', regex=True)] part fetches the df with the found matches. Later, the count column values are accessed again and 0 chars are counted there with Series.str.count and the result is assigned to df['count_conv'].

  • Related