I have a text file that needs to be read line by line and converted into a data frame with the 4 following columns ['Customer ID', 'Rating', 'Date', 'Movie ID']
There are 17,770 movie ID's and each text file has the following format
Movie ID:
Customer ID, Rating, Date
Customer ID, Rating, Date
. . .
Movie ID:
Customer ID, Rating, Date
Customer ID, Rating, Date
. . .
All the way up to the 17,770th movie ID in ascending order
See images below for snip of text files....
I dont know how to do this. Please advise.
CodePudding user response:
First, create the dictionary containing these headers. Then read each line in this text file using commands like readline([n]). If your characters are special characters such as commas or spaces. Put these values in the keys in the dictionary. Then you can create a data frame by converting the dictionary to csv file easily with the pandas library of python. You can read the documentation of Pandas.
CodePudding user response:
import re
with open('text.txt') as f: #replace text.txt with your text file path
for line in f:
result = re.search(r"^(\d ),(\d ),(\d{4}-\d{2}-\d{2})"gm, line)
result2 =
if re.search(r"(^\d ):", line) is not None:
movie_id = re.search(r"(^\d ):", line).group(1)
elif result:
costomer_id = result.group(1)
rating = result.group(2)
date = result.group(3)
data_list = [costomer_id, rating, date, movie_id] #data that you want. you can store it as csv file
# YOUR CODE
else:
continue