Home > Software design >  How to filter and sort specific csv using python
How to filter and sort specific csv using python

Time:05-18

Please help me with the python script to filter the below CSV. Below is the example of the CSV dump for which I have done the initial filtration.

Last_name Gender Name Phone city
Ford Male Tom 123 NY
Rich Male Robert 21312 LA
Ford Female Jessica 123123 NY
Ford Male John 3412 NY
Rich Other Linda 12312 LA
Ford Other James 4321 NY
Smith Male David 123123 TX
Rich Female Mary 98689 LA
Rich Female Jennifer 86860 LA
Ford Male Richard 12123 NY
Smith Other Daniel 897097 TX
Ford Other Lisa 123123123 NY
import re

def gather_info (L_name):
    dump_filename = "~/Documents/name_report.csv"
    LN = []
    with open(dump_filename, "r") as FH:
        for var in FH.readlines():
            if L_name in var
                final = var.split(",")
                print(final[1], final[2], final[3])
    return LN

if __name__ == "__main__":
    L_name = input("Enter the Last name: ")

    la_name = gather_info(L_name)

By this, I am able to filter by the last name. for example, if I choose L_name as Ford, then I have my output as

Gender Name Phone
Male Tom 123
Female Jessica 123123
Male John 3412
Other James 4321
Male Richard 12123
Other Lisa 22412

I need help extending the script by selecting each gender and the values in the list to perform other functions, then calling the following gender and the values to achieve the same functions. for example, first, it selects the gender Male [Tom, John] and performs other functions. then selects the next gender Female [Jessica] and performs the same functions and then selects the gender Other [James, Lisa] and performs the same functions.

CodePudding user response:

I would recomend using the pandas module which allows for easy filtering and grouping of data

import pandas as pd

if __name__ == '__main__':
    data = pd.read_csv('name_reports.csv')
    L_name = input("Enter the last name: ")
    by_last_name = data[data['Last_name'] == L_name]
    groups = by_last_name.groupby(['Gender'])
    for group_name, group_data in groups:
        print(group_name)
        print(group_data)

Breaking this down into its pieces the first part is

data = pd.read_csv('name_reports.csv')

This reads the data from the csv and places it into a dataframe Second we have

by_last_name = data[data['Last_name'] == L_name]

This filters the dataframe to only have results with Last_name equal to L_name Next we group the data.

groups = by_last_name.groupby(['Gender'])

this groups the filtered data frames by gender then we iterate over this. It returns a tuple with the group name and the dataframe associated with that group.

for group_name, group_data in groups:
    print(group_name)
    print(group_data)

This loop just prints out the data to access fields from it you can use the iterrows function

for index,row in group_data.iterrows():
    print(row['city']
    print(row['Phone']
    print(row['Name']

And then you can use those for whatever function you want. I would recommend reading on the documentation for pandas since depending on the function you plan on using there may be a better way to do it using the library. Here is the link to the library https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html

CodePudding user response:

Since you cannot use the pandas module then a method using only the csv module would look like this

import csv
def has_last_name(row,last_name):
    return row['Last_name'] == last_name
def has_gender(row,current_gender):
    return row['Gender'] == current_gender
if __name__ == '__main__':
    data = None
    genders = ['Male','Female','Other']
    with open('name_reports.csv') as csvfile:
        data = list(csv.DictReader(csvfile,delimiter=','))
    L_name = input('Enter the Last name: ')
    get_by_last_name = lambda row: has_last_name(row,L_name)
    filtered_by_last_name = list(filter(get_by_last_name,data))
    for gender in genders:
        get_by_gender = lambda row: has_gender(row,gender)
        filtered_by_gender = list(filter(get_by_gender,filtered_by_last_name))
        print(filtered_by_gender)

The important part is the filter built in function. This takes in a function that takes in an item from a list and returns a bool. filter takes this function and an iterable and returns a generator of items that return true for that function. The other important part is the csv.DictReader which returns your csv file as a dictionary which makes allows you to access attributes by key instead of by index.

  • Related