Home > OS >  Identify names starting with punctuation
Identify names starting with punctuation

Time:04-09

To check if a row starts with punctuation, I was thinking of the use of string.punctuation and string.startswith. But when I do

df['start_with_punct']=df['name'].str.startswith(string.punctuation)

I get False when the names actually start with punctuation.

Example of data is

Name       
_faerrar_
!gfaherr_!£
nafjetes_

Expected output

Name            start_with_punct
_faerrar_           True
!gfaherr_!£         True
nafjetes_           False

I would need to understand how to get the right output as I would need also to test this with names starting with capital letter.

CodePudding user response:

Use tuple for pass multiple values to Series.str.startswith:

df['start_with_punct']=df['Name'].str.startswith(tuple(string.punctuation))
print (df)
         Name  start_with_punct
0   _faerrar_              True
1  !gfaherr_!              True
2   nafjetes_             False

For testing if first value is uppercase use Series.str.isupper with indexing str[0]:

df['start_with_upper']=df['Name'].str[0].str.isupper()
print (df)
         Name  start_with_upper
0    Aaerrar_              True
1  dgfaherr_!             False
2   Nafjetes_              True

CodePudding user response:

You can also use str.match as match is by default anchored to the start of the string:

import re
regex = '[%s]' % re.escape(string.punctuation)
df['start_with_punct'] = df['Name'].str.match(regex)

output:

          Name  start_with_punct
0    _faerrar_              True
1  !gfaherr_!£              True
2    nafjetes_             False
  • Related