I got some issue using regular expression in python.
As shown in example below,
What I want to do is to put 'string_1' in front of the paragraph that starts with the letter 'past'.(refer to the code below)
input.txt
some random texts (100 lines)
bbb
...
ttt
modern_b different_story(
...
some random texts
...
);
past_c different_story(
...
some random texts
...
);
desired output.txt
some random texts (100 lines)
bbb
...
ttt
modern_b different_story(
...
some random texts
...
);
java is fun;
python is fun;
past_c different_story(
...
some random texts
...
);
Ok, then here's my code so far:
import re
output_file = open('output.txt', 'w')
input_file = open('input.txt', 'r')
string_1 = 'java is fun; \npython is fun;'
t = re.sub(
'(?=\s*^past)',
'\n' string_1 '\n' ,
input_file.read(),
0,
re.M
)
output_file.write(t)
When I run this code, the output appears in front of the 'past_c' but It comes with multiple same lines.. I just need the string only 'one' time. I think I almost arrived at the end point. but the multiple line seems pain in the neck now.. I am assuming the string is long enough. and I need to refer to only the next paragraph, not the previous paragraph('past_c' in this case.) Can you give me any idea to correct this problem? Or, Any more efficient way to achieve this would be also welcomed!! Thanks
CodePudding user response:
positive look ahead may help.
text = '''
bbb
...
ttt
modern_b different_story(
...
some random texts
...
);
past_c different_story(
...
some random texts
...
);
'''
re.sub(r'(?=past_c different_story)', r'java is fun;\npython is fun;\n\n', text)
bbb
...
ttt
modern_b different_story(
...
some random texts
...
);
java is fun;
python is fun;
past_c different_story(
...
some random texts
...
);
def fun1(text, st):
return re.sub(r'(?=past_c different_story)', fr'{st}', text)
fun1(text, 'java is fun;\npython is fun;\n\n')
bbb
...
ttt
modern_b different_story(
...
some random texts
...
);
java is fun;
python is fun;
past_c different_story(
...
some random texts
...
);
CodePudding user response:
you can use the following to achieve what you want I think:
import re
with open("input.txt", "r") as inp:
with open("output.txt", "w") as oup:
for i in inp:
oup.write("my_string " i) if re.match("^a", i) else oup.write(i)
Basically, what it does is add "my_string" if a line starts with "a" and writes that line to an output file, else it simply copies the line to the output file
This snippet below does the same, but writes it to the same file:
import re
with open("input.txt", "r") as inp:
my_list = []
for i in inp:
my_list.append(("my_string " i)) if re.match("^a", i) else my_list.append(i)
with open("input.txt", "w") as inp:
inp.write("".join(my_list))