hi I'm working on a code to speed up my work, I have a little problem in the format has blank line
.Devi
.Dave
.Liana
.Ricky
.Oswyne
.Devi
.Putra
.Kelvin
.Gilang
.Delvin
this is my source
import re
filex = input('Input file name : ')
file = open(filex, "r")
st3 = file.read()
pattern = r"\bOperational\.[0-9]{1,4}|\bManagement\.[0-9]{1,4}|\bAdmin\.[0-9]{1,4}|\bStaff.{1,25}"
mod_string = re.sub(pattern, '', st3 )
print(mod_string)
open("clean.txt" ,"w").write(mod_string)
this is the list i want to filter
position | Staff Number | Name
Operational.1252.Devi
Staff.1875.Erin
Operational.1552.Dave
Staff.1875.Hutri
Operational.1952.Liana
Management.1292.Ricky
Staff.1875.Udin
Management.1852.Oswyne
Staff.1875.Udin
Operational.1052.Devi
Management.1282.Putra
Operational.1262.Kelvin
Admin.9823.Gilang
Staff.1275.Siska
Staff.1835.Udin
Admin.9823.Gilang
Staff.1875.Silalahi
Management.1282.Delvin
and more List....
and I want to make my format to be like with out blank line & Without duplicate line
.Devi
.Dave
.Liana
.Ricky
.Oswyne
.Devi
.Putra
.Kelvin
.Gilang
.Delvin
CodePudding user response:
By using your data I make another script without using regex but using readlines and split
the idea is to read the file line by line using readlines, after that you can split the string into 3 part using . as separator, and get the last string
if you need to use regex you can ignore this answer
file1 = open('test.txt', 'r')
lines = file1.readlines()
# ignore first line
lines = lines[1:]
output_file = open('output.txt','w')
for line in lines:
# split the line using . as separator and get last string
output_file.write(line.split('.')[2])
output_file.close()
the output will be :
Devi
Erin
Dave
Hutri
Liana
Ricky
Udin
Oswyne
Udin
Devi
Putra
Kelvin
Gilang
Siska
Udin
Gilang
Silalahi
Delvin
CodePudding user response:
thank you all, I've got a little idea from uncle google
filex = input('Input file name : ')
file = open(filex, "r")
st3 = file.read()
pattern = r"\bOperational\.[0-9]{1,4}|\bManagement\.[0-9]{1,4}|\bAdmin\.[0-9]{1,4}|\bStaff\.[0-9]{1,4}"
mod_string = re.sub(pattern, '', st3)
lines = mod_string.split("\n")
non_empty_lines = [line for line in lines if line.strip() != ""]
string_without_empty_lines = ""
for line in non_empty_lines:
string_without_empty_lines = line "\n"
words = string_without_empty_lines.split()
print ("" '\n'.join(sorted(set(words), key=words.index )))
and the result comes out like this without Duplicate :D
.Devi
.Erin
.Dave
.Hutri
.Liana
.Ricky
.Oswyne
.Udin
.Putra
.Kelvin
.Siska
.Gilang
.Silalahi
.Delvin
result proof : https://prnt.sc/AcCX7JmlvvyN
CodePudding user response:
Instead of using re.sub and splitting the lines, you could also use a specific match with a capture group.
Looking at your provided answer, you can shorted the pattern to:
^(?:Operational|Management|Admin|Staff)\.[0-9]{1,4}(\.. )
In parts, the pattern matches:
^
Start of string(?:
Non capture groupOperational|Management|Admin|Staff
Match one of the alternatives
)
Close non capture group\.[0-9]{1,4}
Match.
and 1-4 digits 0-9(\.. )
Capture group 1, match a.
and 1 or more times any character
See a regex demo and a Python demo.
For example:
import re
filex = input('Input file name : ')
file = open(filex, "r")
st3 = file.read()
pattern = r"^(?:Operational|Management|Admin|Staff)\.[0-9]{1,4}\.(. )"
result = sorted(set(re.findall(pattern, st3, re.M)))
print(result)
Output of the sorted set, where re.findall returns the value of the capture group 1 values:
[
'.Dave',
'.Delvin',
'.Devi',
'.Erin',
'.Gilang',
'.Hutri',
'.Kelvin',
'.Liana',
'.Oswyne',
'.Putra',
'.Ricky',
'.Silalahi',
'.Siska',
'.Udin'
]