Home > database >  How to filter a collection by multiple conditions
How to filter a collection by multiple conditions

Time:11-30

I have a csv file named film.csv here is the header line with a few lines to use as an example

Year;Length;Title;Subject;Actor;Actress;Director;Popularity;Awards;*Image
1990;111;Tie Me Up! Tie Me Down!;Comedy;Banderas, Antonio;Abril, Victoria;Almodóvar, Pedro;68;No;NicholasCage.png
1991;113;High Heels;Comedy;Bosé, Miguel;Abril, Victoria;Almodóvar, Pedro;68;No;NicholasCage.png
1983;104;Dead Zone, The;Horror;Walken, Christopher;Adams, Brooke;Cronenberg, David;79;No;NicholasCage.png
1979;122;Cuba;Action;Connery, Sean;Adams, Brooke;Lester, Richard;6;No;seanConnery.png
1978;94;Days of Heaven;Drama;Gere, Richard;Adams, Brooke;Malick, Terrence;14;No;NicholasCage.png
1983;140;Octopussy;Action;Moore, Roger;Adams, Maud;Glen, John;68;No;NicholasCage.png

I am trying to filter, and need to display the move titles, for this criteria: first name contains "Richard", Year < 1985, Awards == "Y"

I am able to filter for the award, but not the rest. can you help?

file_name = "film.csv"
lines = (line for line in open(file_name,encoding='cp1252')) #generator to capture lines
lists = (s.rstrip().split(";") for s in lines) #generators to capture lists containing values from lines

#browse lists and index them per header values, then filter all movies that have been awarded
#using a new generator object

cols=next(lists) #obtains only the header
print(cols)
collections = (dict(zip(cols,data)) for data in lists)
    
filtered = (col["Title"] for col in collections if col["Awards"][0] == "Y")
                                                
                                                 
                                                       
for item in filtered:
        print(item)
    #   input()

This works for the award but I don't know how to add additional filters. Also when I try to filter for if col["Year"] < 1985 I get error message because string vs int not compatible. How do I make the years a value? I believe for the first name I can filter like this: if col["Actor"].split(", ")[-1] == "Richard"

CodePudding user response:

You know how to add one filter. There is no such thing as "additional" filters. Just add your conditions to the current condition. Since you want all of the conditions to be True to select a record, you'd use the boolean and logic. For example:

filtered = (
             col["Title"] 
             for col in collections 
             if col["Awards"][0] == "Y"
            and col["Actor"].split(", ")[-1] == "Richard"
            and int(col["Year"]) < 1985
           )

Notice I added an int() around the col["Year"] to convert it to an integer.


You've actually gone and reinvented csv.DictReader in the setup to this problem! Instead of

file_name = "film.csv"
lines = (line for line in open(file_name,encoding='cp1252')) #generator to capture lines
lists = (s.rstrip().split(";") for s in lines) #generators to capture lists containing values from lines

#browse lists and index them per header values, then filter all movies that have been awarded
#using a new generator object

cols=next(lists) #obtains only the header
print(cols)
collections = (dict(zip(cols,data)) for data in lists)
filtered = ...

You could have just done:

import csv

file_name = "film.csv"
with open(file_name) as f:
    collections = csv.DictReader(delimiter=";")
    filtered = ...
  • Related