Home > Back-end >  Handle csv file with almost similar records but different times - need to group them as one record
Handle csv file with almost similar records but different times - need to group them as one record

Time:07-13

I am attempting to resolve the below lab and having issues. This problem involves a csv input. There is criteria that the solution needs to meet. Any help or tips at all would be appreciated. My code is at the end of the problem along with my output.

Each row contains the title, rating, and all showtimes of a unique movie.
A space is placed before and after each vertical separator ('|') in each row.
Column 1 displays the movie titles and is left justified with a minimum of 44 characters.
If the movie title has more than 44 characters, output the first 44 characters only.
Column 2 displays the movie ratings and is right justified with a minimum of 5 characters.
Column 3 displays all the showtimes of the same movie, separated by a space.

This is the input:

16:40,Wonders of the World,G
20:00,Wonders of the World,G
19:00,End of the Universe,NC-17
12:45,Buffalo Bill And The Indians or Sitting Bull's History Lesson,PG
15:00,Buffalo Bill And The Indians or Sitting Bull's History Lesson,PG
19:30,Buffalo Bill And The Indians or Sitting Bull's History Lesson,PG
10:00,Adventure of Lewis and Clark,PG-13
14:30,Adventure of Lewis and Clark,PG-13
19:00,Halloween,R

This is the expected output:

Wonders of the World                         |     G | 16:40 20:00
End of the Universe                          | NC-17 | 19:00
Buffalo Bill And The Indians or Sitting Bull |    PG | 12:45 15:00 19:30
Adventure of Lewis and Clark                 | PG-13 | 10:00 14:30
Halloween                                    |     R | 19:00

My code so far:

import csv
rawMovies = input()
repeatList = []

with open(rawMovies, 'r') as movies:
    moviesList = csv.reader(movies)
    for movie in moviesList:
        time = movie[0]
        #print(time)
        show = movie[1]
        if len(show) > 45:
            show = show[0:44]
        #print(show)
        rating = movie[2]
        #print(rating)
        print('{0: <44} | {1: <6} | {2}'.format(show, rating, time))

My output doesn't have the rating aligned to the right and I have no idea how to filter for repeated movies without removing the time portion of the list:

Wonders of the World                         | G      | 16:40
Wonders of the World                         | G      | 20:00
End of the Universe                          | NC-17  | 19:00
Buffalo Bill And The Indians or Sitting Bull | PG     | 12:45
Buffalo Bill And The Indians or Sitting Bull | PG     | 15:00
Buffalo Bill And The Indians or Sitting Bull | PG     | 19:30
Adventure of Lewis and Clark                 | PG-13  | 10:00
Adventure of Lewis and Clark                 | PG-13  | 14:30
Halloween                                    | R      | 19:00

CodePudding user response:

For this consider the max length of the rating string. Subtract the length of the rating from that value. Make a string of spaces of that length and append the rating. so basically

your_desired_str = ' '*(6-len(Rating)) Rating

also just replace

'somestr {value}'.format(value)

with f strings, much easier to read

f'somestr {value}'

CodePudding user response:

You could collect the input data in a dictionary, with the title-rating-tuples as keys and the showtimes collected in a list, and then print the consolidated information. For example (you have to adjust the filename):

import csv

movies = {}
with open("data.csv", "r") as file:
    for showtime, title, rating in csv.reader(file):
        movies.setdefault((title, rating), []).append(showtime)
for (title, rating), showtimes in movies.items():
    print(f"{title[:44]: <44} | {rating: >5} | {' '.join(showtimes)}")

Output:

Wonders of the World                         |     G | 16:40 20:00
End of the Universe                          | NC-17 | 19:00
Buffalo Bill And The Indians or Sitting Bull |    PG | 12:45 15:00 19:30
Adventure of Lewis and Clark                 | PG-13 | 10:00 14:30
Halloween                                    |     R | 19:00

Since the input seems to come in connected blocks you could also use itertools.groupby (from the standard library) and print while reading:

import csv
from itertools import groupby
from operator import itemgetter

with open("data.csv", "r") as file:
    for (title, rating), group in groupby(
        csv.reader(file), key=itemgetter(1, 2)
    ):
        showtimes = " ".join(time for time, *_ in group)
        print(f"{title[:44]: <44} | {rating: >5} | {showtimes}")
  • Related