I'm getting a png file name from a file then and using regex to specify a 4 digit number png file name, removing the punctuations marks and saving it to another file
What has stumped me was trying to put every individual value on the list in a string like:
<div ><img title="" src="images/char/{HERE}.png" ></div>
And then save it to the file as:
<div ><img title="" src="images/char/1432.png" ></div>
<div ><img title="" src="images/char/1250.png" ></div>
<div ><img title="" src="images/char/1324.png" ></div>
This is the code
import re
import pyperclip
def remove_punc(string):
punc = '''!()-[]{};:'"\, <>./?@#$%^&*_~'''
for ele in string:
if ele in punc:
string = string.replace(ele, "")
return string
text_file = open(r'C:\My Web Sites\image_data(1).txt', 'r')
s = text_file.read()
text_file.close()
string_pattern = r"\d{4}\."
regex_pattern = re.compile(string_pattern)
# find all the matches in string one
result = regex_pattern.findall(s)
result = [remove_punc(i) for i in result]
with open(r'C:\My Web Sites\1.txt', 'w') as fp:
for item in result:
# write each item on a new line
fp.write("%s\n" % item)
fp.close()
EDIT
This is a sample of the text file
<div ><div ><img src="resources/images/bgs/5.png" ><img src="resources/images/thumb/1535.png" one rror="this.src='resources/images/thumb/noimage.png';"><img rel="popover" tabindex="0" src="resources/images/frames/5.png" data-html="true" data-trigger="focus" data-toggle="popover" data-placement="bottom" data-content="Rarity: 5★<br/>Level: 1/60<br/>Level: 0/4<br/>Level: 1/5<br/>: 0%" title="" data-original-title="<font color='red'><br/>(version)</font>"><img src="resources/images/elements/3.png" ></div><div ><img src="resources/images/bgs/5.png" ><img src="resources/images/thumb/1510.png" one rror="this.src='resources/images/thumb/noimage.png';"><img rel="popover" tabindex="1" src="resources/images/frames/5.png" data-html="true" data-trigger="focus" data-toggle="popover" data-placement="bottom" data-content="Rarity: 5★<br/>Level: 1/80<br/>Level: 4/4<br/>Level: 1/5<br/>: 0%" title="" data-original-title="<font color='#F96700'><br/>(version)</font>"><img src="resources/images/elements/5.png" ></div><div ><img src="resources/images/bgs/5.png" ><img src="resources/images/thumb/1403.png" one rror="this.src='resources/images/thumb/noimage.png';"><img rel="popover" tabindex="2" src="resources/images/frames/5.png" data-html="true" data-trigger="focus" data-toggle="popover" data-placement="bottom" data-content="Rarity: 5★<br/>Level: 1/80<br/>Level: 4/4<br/>Level: 1/5<br/>: 0%" title="" data-original-title="<font color='#071BA0'><br/>(version)</font>"><img src="resources/images/elements/4.png" ></div><div ><img src="resources/images/bgs/5.png" ><img src="resources/images/thumb/1388.png" one rror="this.src='resources/images/thumb/noimage.png';"><img rel="popover" tabindex="3" src="resources/images/frames/5.png" data-html="true" data-trigger="focus" data-toggle="popover" data-placement="bottom" data-content="Rarity: 5★<br/>Level: 1/80<br/>Level: 4/4<br/>Level: 1/5<br/>: 0%" title="" data-original-title="<font color='#F96700'><br/>(version)</font>"><img src="resources/images/elements/5.png" ></div><div ><img src="resources/images/bgs/6.png" ><img src="resources/images/thumb/1323.png" one rror="this.src='resources/images/thumb/noimage.png';"><img rel="popover" tabindex="4" src="resources/images/frames/6.png" data-html="true" data-trigger="focus" data-toggle="popover" data-placement="bottom" data-content="Rarity: 6★<br/>Level: 200/200<br/>Level: 4/4<br/>Level: 1/5<br/>: 150%<br/>1: 0/10<br/>2: 0/10<br/>3: 0/10<br/>" title="<font color='red'><br/>(version)</font>"><img src="resources/images/elements/3.png" ></div><div ><img src="resources/images/bgs/5.png" ><img src="resources/images/thumb/1322.png"
Output
1535
1510
1403
1388
1323
1322
CodePudding user response:
To create your file you can use str.format
. For example:
s = """<div ><img title="" src="images/char/{}.png"></div>"""
result = [1432, 1250, 1324] # <-- your result with removed punctuations
with open("data.txt", "w") as fp:
for item in result:
print(s.format(item), file=fp)
creates data.txt
with content:
<div ><img title="" src="images/char/1432.png"></div>
<div ><img title="" src="images/char/1250.png"></div>
<div ><img title="" src="images/char/1324.png"></div>
CodePudding user response:
Given more info about the author here
This pattern should do the trick (\d{4})\.(?=png)
Where
- Captures digits exactly at 4 times
- And ends with .png
If you want to add support for example with jpeg you can change the pattern to (\d{4})\.(?=png|jpeg)
For online testing i coded this, but it should work loading the file then using findall. The rest of the job is yours.
import re
string = "<div class=\"parent\"><img class=\"img\" title=\"\" src=\"images/char/1432.png\" ></div>\n<div class=\"parent\"><img class=\"img\" title=\"\" src=\"images/char/1250.png\" ></div>\n<div class=\"parent\"><img class=\"img\" title=\"\" src=\"images/char/1324.png\" ></div>\n<div class=\"parent\"><img class=\"img\" title=\"\" src=\"images/char/1324.jpeg\" ></div>"
pattern = re.compile(r'(\d{4})\.(?=png)')
print(pattern.findall(string))
where the output is
['1432', '1250', '1324']