Home > Enterprise >  How do I turn txt file into data frame with column names
How do I turn txt file into data frame with column names

Time:10-09

I have a text file that needs to be read line by line and converted into a data frame with the 4 following columns ['Customer ID', 'Rating', 'Date', 'Movie ID']

There are 17,770 movie ID's and each text file has the following format

Movie ID:

Customer ID, Rating, Date

Customer ID, Rating, Date

. . .

Movie ID:

Customer ID, Rating, Date

Customer ID, Rating, Date

. . .

All the way up to the 17,770th movie ID in ascending order

See images below for snip of text files....

1st image (movie ID 1)

2nd image (movie ID 2)

I dont know how to do this. Please advise.

CodePudding user response:

First, create the dictionary containing these headers. Then read each line in this text file using commands like readline([n]). If your characters are special characters such as commas or spaces. Put these values ​​in the keys in the dictionary. Then you can create a data frame by converting the dictionary to csv file easily with the pandas library of python. You can read the documentation of Pandas.

CodePudding user response:

import re

with open('text.txt') as f:  #replace text.txt with your text file path
  for line in f:
    result = re.search(r"^(\d ),(\d ),(\d{4}-\d{2}-\d{2})"gm, line)
    result2 = 
    if re.search(r"(^\d ):", line) is not None:
      movie_id = re.search(r"(^\d ):", line).group(1)
    elif result:
      costomer_id = result.group(1)
      rating = result.group(2)
      date = result.group(3)

      data_list = [costomer_id, rating, date, movie_id]    #data that you want. you can store it as csv file
      # YOUR CODE

    else:
      continue

    
  • Related