I am pulling data from a table that changes often using Python - and the method I am using is not ideal. What I would like to have is a method to pull all strings that contain only one letter and leave out anything that is 2 or more.
An example of data I might get:
115 19A6 HYS8 568
In this example, I would like to pull 115, 19A6, and 568.
Currently I am using the isdigit() method to determine if it is a digit and this filters out all numbers with one letter, which works for some purposes, but is less than ideal.
CodePudding user response:
Try this:
string_list = ["115", "19A6", "HYS8", "568"]
output_list = []
for item in string_list: # goes through the string list
letter_counter = 0
for letter in item: # goes through the letters of one string
if not letter.isdigit(): # checks if the letter is a digt
letter_counter = 1
if letter_counter < 2: # if the string has more then 1 letter it wont be in output list
output_list.append(item)
print(output_list)
Output:
['115', '19A6', '568']
CodePudding user response:
Here is a one-liner with a regular expression:
import re
data = ["115", "19A6", "HYS8", "568"]
out = [string for string in data if len(re.sub("\d", "", string))<2]
print(out)
Output:
['115', '19A6', '568']
CodePudding user response:
This is an excellent case for regular expressions (regex), which is available as the built-in re
library.
The code below follows the logic:
- Define the dataset.
- Compile a character pattern to be matched. In this case, zero or more digits, followed by zero or one upper case letter, ending with zero of more digits.
- Use the
filter
function to detect matches in the data list and output as a list.
For example:
import re
data = ['115', '19A6', 'HYS8', '568']
rexp = re.compile('^\d*[A-Z]{0,1}\d*$')
result = list(filter(rexp.match, data))
print(result)
Output:
['115', '19A6', '568']