To check if a row starts with punctuation, I was thinking of the use of string.punctuation and string.startswith. But when I do
df['start_with_punct']=df['name'].str.startswith(string.punctuation)
I get False when the names actually start with punctuation.
Example of data is
Name
_faerrar_
!gfaherr_!£
nafjetes_
Expected output
Name start_with_punct
_faerrar_ True
!gfaherr_!£ True
nafjetes_ False
I would need to understand how to get the right output as I would need also to test this with names starting with capital letter.
CodePudding user response:
Use tuple
for pass multiple values to Series.str.startswith
:
df['start_with_punct']=df['Name'].str.startswith(tuple(string.punctuation))
print (df)
Name start_with_punct
0 _faerrar_ True
1 !gfaherr_! True
2 nafjetes_ False
For testing if first value is uppercase use Series.str.isupper
with indexing str[0]
:
df['start_with_upper']=df['Name'].str[0].str.isupper()
print (df)
Name start_with_upper
0 Aaerrar_ True
1 dgfaherr_! False
2 Nafjetes_ True
CodePudding user response:
You can also use str.match
as match is by default anchored to the start of the string:
import re
regex = '[%s]' % re.escape(string.punctuation)
df['start_with_punct'] = df['Name'].str.match(regex)
output:
Name start_with_punct
0 _faerrar_ True
1 !gfaherr_!£ True
2 nafjetes_ False