I'm trying to extract a string of zero numbers from a column and convert/calculate zero count to another column by regex, but it shows error. How can I solve this? Thank you!
My code:
import pandas as pd
df = pd.DataFrame(columns=['count'],
data=[['ABC0000000000000000000000000000000000000000000000000000000000000000'],
['ABC0000000000000000000000000000000000101000000000000000000000000000'],
['ABC0000000000000000010010000000000000010000000010000000100000000000'],
['ABC0000110000000000000000000000000000000000000000000000000000000000'],
['N/A']])
def conv(x):
m = re.search(r'ABC(?P<COUNT>\S )', x)
cnt = 0
if m != None:
for d in m.group('COUNT'):
if d == '0':
cnt = 1
return cnt
df['count_conv'] = df['count'].apply(conv)
My error code:
TypeError: expected string or bytes-like object
My expected output:
CodePudding user response:
try this,
df['count_conv'] = df['count'].str.count('0')
0 64
1 62
2 59
3 62
4 0
Name: count_conv, dtype: int64
CodePudding user response:
def conv(x):
return x.count('0')
CodePudding user response:
Assuming you only need to update values for the rows where count
columns starts with ABC
, you can use
df['count_conv'] = df[df['count'].str.contains(r'^ABC', regex=True)]['count'].str.count('0')
The .str.contains(r'^ABC', regex=True)
part detects the rows in the count
column that start with ABC
.
The df[df['count'].str.contains(r'^ABC', regex=True)]
part fetches the df with the found matches. Later, the count
column values are accessed again and 0
chars are counted there with Series.str.count
and the result is assigned to df['count_conv']
.