Sorry in advance for this simple question. Would you please help me to figure out why these two script doing the same. As I understand, in the second one there have to be nested loop so the content must have been multiplied by number of strings in file. But actually they doing the same except first string of input file is lost in the second version of script.
First one:
fin = open("/home/user/other/Utilities/url.text", "rt")
fout = open("/home/user/other/Utilities/url_out.txt", "wt")
count_it=0
for line in fin:
print(line)
if "hostname" in line:
# for line in fin:
fout.write(line.replace("thumbs", "images"))
count_it=count_it 1
print(line)
fin.close()
fout.close()
print(count_it)
The second one:
fin = open("/home/user/other/Utilities/url.text", "rt")
fout = open("/home/user/other/Utilities/url_out.txt", "wt")
count_it=0
for line in fin:
print(line)
if "hostname" in line:
for line in fin:
fout.write(line.replace("thumbs", "images"))
count_it=count_it 1
print(line)
fin.close()
fout.close()
print(count_it)
Thanks in advance.
CodePudding user response:
You stumbled upon an interesting feature of python, that is called generators. If you iterate the lines of a file like in
for line in fin:
...
you can think of consuming the lines from that file. You cannot iterate a second time over fin
(actually you can, but you won't get anything out of it). The following code for example will only print all the lines once. The second iteration will not print anything, because you already read (or consumed) all the lines from that file.
for line in fin:
print(line)
for line in fin:
print(line)
Similarly in your code, each line can be only read once from the file, and the nested iteration consumes all the lines, leaving none left for the outer iteration to continue.
Iterators with this behaviour are called generators and are a common thing in python. It is more suitable to think of them as a function that is repeatedly called rather than a real data structure. Another example of a common generator is the return value of the map()
builtin.
CodePudding user response:
Why these two script are doing the same?
Actually they are not. If you see the same output at the end, that's just a coincidence. Whatever causes the same result, these codes behave differently and give a different output in general.
Let's see this example:
from io import StringIO
data = """\
hostname thumbs
hostname hello world
othername thumbs
othername hello world
"""
print("\nInput data:\n".upper())
print(data)
print('-'*70)
fin = StringIO(data)
fout = StringIO("")
count_it=0
for line in fin:
print(f'Outer print: {count_it=}, {line=}')
if "hostname" in line:
for line in fin:
fout.write(line.replace("thumbs", "images"))
count_it=count_it 1
print(f'Inner print: {count_it=}, transformed line={line.replace("thumbs", "images")!r}')
print('-'*70)
print("\nTransformed data:\n".upper())
print(fout.getvalue())
Here we have the second specimen in action, and its output is:
INPUT DATA:
hostname thumbs
hostname hello world
othername thumbs
othername hello world
----------------------------------------------------------------------
Outer print: count_it=0, line='hostname thumbs\n'
Inner print: count_it=1, transformed line='hostname hello world\n'
Inner print: count_it=2, transformed line='othername images\n'
Inner print: count_it=3, transformed line='othername hello world\n'
----------------------------------------------------------------------
TRANSFORMED DATA:
hostname hello world
othername images
othername hello world
Comment the inner for-loop, and voila! - the output is different:
INPUT DATA:
hostname thumbs
hostname hello world
othername thumbs
othername hello world
----------------------------------------------------------------------
Outer print: count_it=0, line='hostname thumbs\n'
Inner print: count_it=1, transformed line='hostname images\n'
Outer print: count_it=1, line='hostname hello world\n'
Inner print: count_it=2, transformed line='hostname hello world\n'
Outer print: count_it=2, line='othername thumbs\n'
Outer print: count_it=2, line='othername hello world\n'
----------------------------------------------------------------------
TRANSFORMED DATA:
hostname images
hostname hello world
As we can see, the last output printed only transformed lines with "hostname"
inside, and skipped all others. Whereas the first one skipped the first line and printed all subsequent lines with word replacement.