I have multiple datasets containing a common column - GuestCode. For all datasets, I want to create another column that tells me whether or not GuestCode contains letters in each row.
I was able to do this successfully for one of the datasets using the code below:
df['TestResult'] = df['GuestCode'].str.contains(r"[^a-zA-Z\s']", regex=True)
GuestCode | TestResult |
---|---|
5885 | nan |
CCM6505 | True |
I'm not 100% sure, but I think this worked because in this dataset, GuestCode was read as an 'object' datatype in the dataframe.
However, when I try the same code on a dataset that contains only numeric values in GuestCode, the same code doesn't work because GuestCode gets read as a 'float'. After receiving an
AttributeError: Can only use .str accessor with string values!
I modify the code, but I don't get the correct result.
df['TestResult'] = df['GuestCode'].astype(str).str.contains(r"[^a-zA-Z\s']", regex=True)
GuestCode | TestResult |
---|---|
4445 | True |
CCM6515 | True |
I'm not married to the regex solution, I just need a way to successfully identify whether or not a I have alphabet characters in the GuestCode column.
Thanks in advance.
CodePudding user response:
The character class [^a-zA-Z\s']
matches a single character other than a-zA-Z, a whitespace char or '
If you just want to check for characters A-Za-z you can use
df['TestResult'] = df['GuestCode'].astype(str).str.contains(r"[a-zA-Z]", regex=True)
or
df['TestResult'] = df['GuestCode'].astype(str).str.match(r"[a-zA-Z]")
Output
GuestCode TestResult
0 4445 False
1 CCM6515 True
CodePudding user response:
To check if each row contains letters you can use:
df['TestResult'] = df['GuestCode'].astype(str).str.contains('[A-Za-z]')
To check if all the characters are alphabet letters you can use:
df['TestResult'] = df['GuestCode'].astype(str).str.isalpha()