How do I turn txt file into data frame with column names-CodePudding

I have a text file that needs to be read line by line and converted into a data frame with the 4 following columns ['Customer ID', 'Rating', 'Date', 'Movie ID']

There are 17,770 movie ID's and each text file has the following format

Movie ID:

Customer ID, Rating, Date

. . .

Movie ID:

Customer ID, Rating, Date

. . .

All the way up to the 17,770th movie ID in ascending order

See images below for snip of text files....

1st image (movie ID 1)

2nd image (movie ID 2)

I dont know how to do this. Please advise.

CodePudding user response：

First, create the dictionary containing these headers. Then read each line in this text file using commands like readline([n]). If your characters are special characters such as commas or spaces. Put these values in the keys in the dictionary. Then you can create a data frame by converting the dictionary to csv file easily with the pandas library of python. You can read the documentation of Pandas.

CodePudding user response：

import re

with open('text.txt') as f:  #replace text.txt with your text file path
  for line in f:
    result = re.search(r"^(\d ),(\d ),(\d{4}-\d{2}-\d{2})"gm, line)
    result2 = 
    if re.search(r"(^\d ):", line) is not None:
      movie_id = re.search(r"(^\d ):", line).group(1)
    elif result:
      costomer_id = result.group(1)
      rating = result.group(2)
      date = result.group(3)

      data_list = [costomer_id, rating, date, movie_id]    #data that you want. you can store it as csv file
      # YOUR CODE

    else:
      continue