What I want to do is look for a specific pattern. 1 letter, a dash, followed by a year and letter like "A-2012A". After that, the rest of the column's value can be anything. I want to confirm this first part. And return a true/false value. Is it possible?
pattern letter-yearletter
String validation on one column with regular expression.
example_column_1
DNA \ Assay |
---|
A-2000X-27 |
A-2000X-32 |
A-2000X-45 |
A-2000X-48 |
A-2000X-80 |
truth_value = df['DNA \ Assay'].str.match(r'').astype(bool)
Sample, with nothing in the r''
regular expression.
My expected output would be True
example_column_2
DNA \ Assay |
---|
Embryo FTA-Code-ID-2 |
Embryo FTA-Code-ID-3 |
Embryo FTA-Code-ID-4 |
Embryo FTA-Code-ID-5 |
Embryo FTA-Code-ID-6 |
My expected output with example_column_2
would be False
CodePudding user response:
Use a regex:
df['valid'] = df['DNA \\ Assay'].str.match(r'[A-Z]-\d{4}[A-Z]', case=False)
output:
DNA \ Assay valid
0 A-2000X-27 True
1 A-2000X-32 True
2 A-2000X-45 True
3 A-2000X-48 True
4 A-2000X-80 True
If you want to validate all
values:
df['DNA \\ Assay'].str.match(r'[A-Z]-\d{4}[A-Z]', case=False).all()
output: True