There is a table in which there is a column with lines of the approximate form:
NAME:ENTRY ID:1562.003:NAME:Defenses:Command::::NAME::ENTRY ID:1574.006:NAME:Execution:Dynamic::::TAXONOMY NAME:ATTACK:ENTRY ID:1574.007:
The number of characters can be different.
I want to take only numbers from each line, I use the code:
import csv
with open("/root/test.csv", encoding='utf-8') as r_file:
file_reader = csv.reader(r_file, delimiter=";")
count = 0
for row in file_reader:
n = f'{row[1]}'
print(''.join([x for x in n if x.isdigit()]))
unfortunately, my code only leads to the result:
156200315740061574007
I wanted to get the result in the form of:
1562.003; 1574.006; 1574.007
Could you please tell me how to implement such a conclusion? Thank you very much
CodePudding user response:
You can use a regular expression to test whether the cells contain what appears to be a number. For efficiency, you can then compile that regular expression before starting the loop.
The expression is a bit verbose unfortunately, but it is looking for something that contains an optional decimal point (maximum of one), and which must also contain at least one digit somewhere, but if there is a decimal point then there only needs to be at least one digit on one or other side of it. It is also allowed to have a leading minus sign, and maybe some whitespace at the start and end.
import csv
import re
is_number = re.compile(r"^\s*-?(\d \.?\d*|\d*\.?\d )\s*$").match
with open("test.csv", encoding='utf-8') as r_file:
file_reader = csv.reader(r_file, delimiter=":")
for row in file_reader:
print('; '.join(cell for cell in row if is_number(cell)))
NB your delimiter for the input seemed to be wrong - I changed it to :
.