How do I create a column that tells me whether or not another column contains alpha numeric values?-CodePudding

I have multiple datasets containing a common column - GuestCode. For all datasets, I want to create another column that tells me whether or not GuestCode contains letters in each row.

I was able to do this successfully for one of the datasets using the code below:

df['TestResult'] = df['GuestCode'].str.contains(r"[^a-zA-Z\s']", regex=True)

GuestCode	TestResult
5885	nan
CCM6505	True

I'm not 100% sure, but I think this worked because in this dataset, GuestCode was read as an 'object' datatype in the dataframe.

However, when I try the same code on a dataset that contains only numeric values in GuestCode, the same code doesn't work because GuestCode gets read as a 'float'. After receiving an

AttributeError: Can only use .str accessor with string values!

I modify the code, but I don't get the correct result.

df['TestResult'] = df['GuestCode'].astype(str).str.contains(r"[^a-zA-Z\s']", regex=True)

GuestCode	TestResult
4445	True
CCM6515	True

I'm not married to the regex solution, I just need a way to successfully identify whether or not a I have alphabet characters in the GuestCode column.

Thanks in advance.

CodePudding user response：

The character class [^a-zA-Z\s'] matches a single character other than a-zA-Z, a whitespace char or '

If you just want to check for characters A-Za-z you can use

df['TestResult'] = df['GuestCode'].astype(str).str.contains(r"[a-zA-Z]", regex=True)

df['TestResult'] = df['GuestCode'].astype(str).str.match(r"[a-zA-Z]")

Output

  GuestCode  TestResult
0      4445       False
1   CCM6515        True

CodePudding user response：

To check if each row contains letters you can use:

df['TestResult'] = df['GuestCode'].astype(str).str.contains('[A-Za-z]')

To check if all the characters are alphabet letters you can use:

df['TestResult'] = df['GuestCode'].astype(str).str.isalpha()