i have a file with data as such.
>1_DL_2021.1123
>2_DL_2021.1206
>3_DL_2021.1202
>3_DL_2021.1214
>4_DL_2021.1214
>4_DL_2021.1214
>6_DL_2021.1214
>7_DL_2021.1214
>8_DL_2021.1214
now as you can see the data is not numbered properly and hence needs to be numbered. what im aiming for is this:
>1_DL_2021.1123
>2_DL_2021.1206
>3_DL_2021.1202
>4_DL_2021.1214
>5_DL_2021.1214
>6_DL_2021.1214
>7_DL_2021.1214
>8_DL_2021.1214
>9_DL_2021.1214
now the file has a lot of other stuff between these lines starting with > sign. i want only the > sign stuff affected.
could someone please help me out with this. also there are 563 such lines so manually doing it is out of question.
CodePudding user response:
So, assuming input data file is "input.txt"
You can achieve what you want with this
import re
with open("input.txt", "r") as f:
a = f.readlines()
regex = re.compile(r"^>\d _DL_2021\.\d \n$")
counter = 1
for i, line in enumerate(a):
if regex.match(line):
tokens = line.split("_")
tokens[0] = f">{counter}"
a[i] = "_".join(tokens)
counter = 1
with open("input.txt", "w") as f:
f.writelines(a)
So what it does it searches for line with the regex ^>\d _DL_2021\.\d \n$
, then splits it by _
and gets the first (0th) element and rewrites it, then counts up by 1 and continues the same thing, after all it just writes updated strings back to "input.txt"
CodePudding user response:
sudden_appearance already provided a good answer. In case you don't like regex too much you can use this code instead:
new_lines = []
with open('test_file.txt', 'r') as f:
c = 1
for line in f:
if line[0] == '>':
after_dash = line.split('_',1)[1]
new_line = '>' str(c) '_' after_dash
c = 1
new_lines.append(new_line)
else:
new_lines.append(line)
with open('test_file.txt', 'w') as f:
f.writelines(new_lines)
Also you can have a look at this split tutorial for more information about how to use split.