Import data between two strings in text file and delete imported data from file-CodePudding

Although I studied for a long time, I could not find the right answer anywhere. What I need is to retrieve the data contained in a text file and delete the imported data. In fact, the short name for it is usually "cut". But I couldn't find the question and solution that I really needed on Stackoverflow.

First, look at the contents of the file.txt to fully understand the problem and to guide me:

Start

General           : Video
Format            : Matroska at 3 961 kb/s
Length            : 2.50 GiB for 1 h 30 min 12 s 928 ms

Video #1          : AVC at 3 320 kb/s
Aspect            : 1920 x 1080 (1.778) at 24.000 fps

Audio #2          : AC-3 at 640 kb/s
Infos             : 6 channel(s), 48.0 kHz
Language          : tr

Text #3           : UTF-8
Language          : tr

End
--- Passing Data ---
Start

General           : Video
Format            : AVI at 1 113 kb/s
Length            : 718 MiB for 1 h 30 min 12 s 552 ms

Video #0          : MPEG-4 Visual at 976 kb/s
Aspect            : 720 x 404 (1.782) at 24.000 fps

Audio #1          : MPEG Audio at 128 kb/s
Infos             : 2 channel(s), 48.0 kHz

End

As you can see in the file, Start and End specifiers come at certain intervals. I use these specifiers to get the data between them. My code is like this:

f = open('file.txt','r ' , encoding='utf-8')
s = f.read()
start = s.find("Start")   len("Start")
end = s.find("End")
substring = s[start:end]
f.close()
print(substring)

But this code just retrieves the data instead of truncating it. Therefore, it prevents me from passing to a data. Because s.find("Start") and s.find("End") fetches only the first data.

How can I solve this problem? Thanks

CodePudding user response：

I'm not sure what you mean by "Therefore, it prevents me from passing to a data." but I would use s.split("end") to separate and make othe string operations from there because you would have everything to "end" separated in each index of the array. Maybe using splitlines after for an array of the lines of each block of "start/end".

f = open('file.txt','r ' , encoding='utf-8')
   s = f.read()
   blocksOfData = s.split("end")
   f.close()

CodePudding user response：

Apologies if this is poorly formatted, this is my first time on Stack Overflow.

Adding f.truncate(0) before you close the file will erase all of the contents of file.txt.

f = open('file.txt','r ' , encoding='utf-8')
s = f.read()
start = s.find("Start")   len("Start")
end = s.find("End")
substring = s[start:end]
f.truncate(0)
f.close()
print(substring)

CodePudding user response：

Are you looking for something like:

import re

re_blocks = re.compile(r"^\s*Start. ?End\s*$", re.MULTILINE|re.DOTALL)

with open("file.txt", "r") as file:
    blocks = re_blocks.findall(file.read())
    file.seek(0)
    new_file = re_blocks.sub("", file.read())
with open("file.txt", "w") as file:
    file.write(new_file)

blocks is a list with the extracted data-packages. And after extracting them, the file gets re-written without those parts.

CodePudding user response：

File can't work like strings. If you want to remove some part from beginning or middle of file then you have to read all text to memory, edit it in memory, and write all back to file. So you have to open file for writing and write s[:start] and s[end:]

f = open('file.txt', 'r' , encoding='utf-8')
s = f.read()
start = s.find("Start")   len("Start")
end = s.find("End")
substring = s[start:end]
f.close()
print(substring)

f = open('file.txt', 'w' , encoding='utf-8')
f.write(s[:start])
f.write(s[end:])
f.close()

But if you want to work with all blocks Start...End then you don't have to crop it but you can use option start_position in find() to get next elements.

start = s.find("Start", end)   len("Start")
end = s.find("End", start)

end = 0

while True:
    start = s.find("Start", end)
    if start == -1:
        break
    start  = len("Start")
    end = s.find("End", start)
    substring = s[start:end]
    print(substring)
    end  = len("End")

OR you can repeate code for with substring s[end:]

s = s[end:]

while True:
    start = s.find("Start")
    if start == -1:
        break
    start  = len("Start")
    end = s.find("End", start)
    substring = s[start:end]
    print(substring)
    s = s[end:]