I am working on some python code that will evaluate each line of a file, and if that line is a number, it should return false and ignore that line. This code comes from https://github.com/geoff604/sbv2txt/blob/master/README.md and I am working to modify it.
However, no matter which line gets passed to the IsNumeric() function, it still evaluates as False
. I hardcoded the same number as a string "2"
and it evaluated this correctly as True
.
Is there something I am missing when evaluating lines of text?
import sys
def isCaptionText(lineIndex):
if lineIndex.isnumeric():
print('True')
return False
else:
return lineIndex
if len(sys.argv) < 3:
print('Arguments: [source sbv filename] [destination txt filename]')
sys.exit()
with open(sys.argv[1]) as f1:
with open(sys.argv[2], 'a') as f2:
lines = f1.readlines()
for index, line in enumerate(lines):
if isCaptionText(line):
f2.write(line)
print('Output complete. File written as ' sys.argv[2])
The file I am analyzing is text I will list in shortened form below.
2
00:00:04,360 --> 00:00:08,861
St. Louis' home for arts,
education and culture.
3
00:00:08,861 --> 00:00:11,444
(upbeat music)
4
00:00:12,290 --> 00:00:13,610
- [Woman] But we're in a global pandemic.
5
00:00:13,610 --> 00:00:16,000
We're also in a global blood shortage.
6
00:00:16,000 --> 00:00:18,230
- [Man] The more I dug,
the more it took me back
CodePudding user response:
So, whenever you have a new line, python sees the text along with a new line. For example: if the line is 1
, the f1.readlines()
sees that as 1\n
, hence the isNumeric
will return false. The trick here is to use strip
for index, line in enumerate(Lines):
if isCaptionText(line.strip()):
print(line)
CodePudding user response:
- Because everyline in file have a newline symbol which depends on the OS you use, such as the first line
2
, it is actually2\n
in Windows. - You can use
replace
orstrip
to get ride of newline symbol if you want to run your code on specific OS.
example fixed code on Windows:
with open("input.txt") as f1:
lines = f1.readlines()
for index, line in enumerate(lines):
print(line.replace("\n","").isnumeric())
input.txt
2
00:00:04,360 --> 00:00:08,861
St. Louis' home for arts,
education and culture.
3
00:00:08,861 --> 00:00:11,444
(upbeat music)
4
00:00:12,290 --> 00:00:13,610
- [Woman] But we're in a global pandemic.
5
00:00:13,610 --> 00:00:16,000
We're also in a global blood shortage.
6
00:00:16,000 --> 00:00:18,230
- [Man] The more I dug,
the more it took me back
result:
True
False
False
False
False
True
False
False
False
True
False
False
False
True
False
False
False
True
False
False
False
CodePudding user response:
using try and except block
try:
int(lineIndex)
print('True')
return True
except ValueError:
return False
or using type()
if type(lineIndex) == int:
print('True')
return False
else:
return lineIndex
if you have decimals check for float