My text file contains data consisting of numerous entries, with each entry start with the character <
.
By using python, I want to extract data in such a way that only the first five characters of each entry is extracted (in addition to <
).
For example:
my file= [<1
acloclscloclxcccdddddddddddcccccddddddddddddweeeeeeeeeeeeeeeee
<2
lsjfljljljljljljlsjdfojljljlholhowljljljljouopuljlj
<3
ljlhohouojljljjouopuljljljhlhouljljlhh
<4
hououojljljlhouojljljljlhouljljljljoukhklhkhkh......]
And the result I want should be the file containing only <
and first 5 chagacters i.e.
<1
aclo
<2
lsjf
<3
ljlh
<4
houo
CodePudding user response:
for x in text.split("<"):
if x != '':
print(f'<{x[:6]}')
This might help
CodePudding user response:
Using regex
import re
txt = '<1 acloclscloclxcccdddddddddddcccccddddddddddddweeeeeeeeeeeeeeeee <2 lsjfljljljljljljlsjdfojljljlholhowljljljljouopuljlj <3 ljlhohouojljljjouopuljljljhlhouljljlhh <4 hououojljljlhouojljljljlhouljljljljoukhklhkhkh'
print(re.findall('(<[\s\S]{0,6})', txt))
Output -
['<1 aclo', '<2 lsjf', '<3 ljlh', '<4 houo']