I have a text file, like this:
3/11/21, 6:13 PM - Gil: X1000
3/11/21, 6:15 PM - Sergio: <Media omitted>
3/11/21, 6:19 PM - Sergio: X400
3/11/21, 6:20 PM - Sergio: Los amigos de vonzo en Francia:
1. La Tóxica
2. El brujo vodoo
3. El/La Zoofilic@
3/11/21, 6:20 PM - Sergio: :V
3/11/21, 6:21 PM - Joan :V: JAJAJAJAJA
Most of the lines start with a date/time that is easy to catch with a regular expression.
I would like to delete the change of line when the date/time is not found, I would expect something like (in a new file):
3/11/21, 6:13 PM - Gil: X1000
3/11/21, 6:15 PM - Sergio: <Media omitted>
3/11/21, 6:19 PM - Sergio: X400
3/11/21, 6:20 PM - Sergio: Los amigos de vonzo en Francia: 1. La Tóxica 2. El brujo vodoo 3. El/La Zoofilic@
3/11/21, 6:20 PM - Sergio: :V
3/11/21, 6:21 PM - Joan :V: JAJAJAJAJA
The problem I have is that I'm reading the file as:
input = open(self.fileName, encoding="utf8" , errors='replace')
for line in input:
output.write(re.sub(#SOMETHING))
With this is that I can only read only line at the time and I don't really get how to change the n
line with a condition in line n 1
.
How can I change line change the n
line with a condition in line n 1
?
CodePudding user response:
Only write \n
when there is datetime
import re
datetime_pattern = '\d{1,2}/\d{1,2}/\d{1,2},\s\d{1,2}:\d{1,2}\s[AP]M'
for line in input:
have_datetime = bool(re.match(datetime_pattern, line)
if have_datetime:
output.write('\n')
output.write(line.strip('\n'))
CodePudding user response:
with
statement is the recommended way to read/write files in Python. We can then read each line, match it with the desired pattern and add newline character accordingly.
import re
datetime_pattern = '\d{1,2}/\d{1,2}/\d{1,2},\s\d{1,2}:\d{1,2}\s[AP]M'
with open(input_file_path, 'r') as infile:
with open(output_file_path, 'w') as outfile:
for (line_number, line) in enumerate(infile):
# We don't need a newline character at the first line
if line_number > 0 and re.match(datetime_pattern, line):
outfile.write('\n')
outfile.write(line.strip('\n'))