I am currently working with a text file that looks like this.
NUMBER = 6367283940 | FOOD = PASTA | NAME = JOHN WALKER
NUMBER = 6367283940 | FOOD = PASTA | NAME = JOHN WALKER
NUMBER = 6367283940 | FOOD = PASTA | NAME = JOHN WALKER
I would like to extract the number (just the integers) and save them all to a text file that would read:
6367283940
6367283940
6367283940
How would I go about doing this?
I am brand new.
CodePudding user response:
There's perhaps a few ways you might approach this.
Regex
A simple regex pattern should work.
import re
text = """\
NUMBER = 6367283940 | FOOD = PASTA | NAME = JOHN WALKER
NUMBER = 6367283940 | FOOD = PASTA | NAME = JOHN WALKER
NUMBER = 6367283940 | FOOD = PASTA | NAME = JOHN WALKER
"""
pattern = '^NUMBER = (\d )'
for number in re.findall(pattern, text):
print(number)
6367283940
6367283940
6367283940
For an explanation of the regex, see this regex101 link.
String splitting
A more rudimentary way may be to use regular string operations, like .split
with open('mytext.txt') as f:
for line in f:
fields = line.split('|')
number_field = fields[0]
_, number = number_field.split(' = ')
print(number)
Csv/pandas
Because your file is pipe-delimited, you could also use the csv
module or pandas
as Nuno Carvalho answered.
CodePudding user response:
This script should work if you name your text file input.txt
. You can also change that in the code. I added some comments to make some steps clear for someone that isn't that experienced. I hope I could help you.
INPUT_FILE = "./input.txt"
OUTPUT_FILE = "./output.txt"
def main():
result_numbers = []
with open(INPUT_FILE) as file: # open the text file in read-only mode
lines = file.readlines() # fetching all lines
for i in lines: # iterate through the lines
first_row = i.split("|")[0].strip() # we only need the first row and we don't need the extra spaces
number = first_row.split("=")[1].strip() # we need the part behind the = and we don't need the space before it
result_numbers.append(number) # add number to the result list
with open(OUTPUT_FILE, "w") as file: # open a new text file in write mode to save the results to it
file.write("\n".join(result_numbers)) # join the results with a line break and write them to that file
if __name__ == '__main__':
main()
If you have any questions, feel free to ask.
CodePudding user response:
Firstly, you could open the text file by using the readlines
method to extract the data in it as a list. Then loop through each element, split each element by a space and add the 3rd element which is the number in all cases, to the variable number
, add \n
or a new line each iteration as well. Finally, write the data into a text file.
with open("data.txt") as file:
data = file.readlines()
numbers = ""
for char in data:
numbers = char.split(" ")[2]
numbers = "\n"
with open("numbers.txt", mode="w") as file:
file.write(numbers)
CodePudding user response:
#input.txt is the input file and output.txt is the output file.
with open('input.txt') as file:
lines = file.readlines()
lines = [line.rstrip() for line in lines]
filename='output.txt'
file_out=open(filename,'a')
import re
for x in lines:
start = 'NUMBER = '
end = 'FOOD'
s = x
result = re.search('%s(.*)%s' % (start, end), s).group(1)[:10 - 1]
file_out.write(result '\n')
CodePudding user response:
I suggest using pandas.
1 - Install the module.
pip install pandas
2 - Save that text in a file named "text.csv".
3 - Run this script
import pandas as pd
data = pd.read_csv("text.csv", header=None, sep="|")
print(data[0])
# Removing 'NUMBER = '
numbers = data[0].apply(lambda x: x.replace("NUMBER = ", ""))
# The output will be here
numbers.to_csv("your-numbers.csv", header=None, index=None)
Result:
your-numbers.csv
6367283940
6367283940
6367283940