Home > Enterprise >  How can I filter a csv file based on its columns in python?
How can I filter a csv file based on its columns in python?

Time:02-22

I have a CSV file with over 5,000,000 rows of data that looks like this (except that it is in Farsi):

Contract Code,Contract Type,State,City,Property Type,Region,Usage Type,Area,Percentage,Price,Price per m2,Age,Frame Type,Contract Date,Postal Code
765720,Mobayee,East Azar,Kish,Apartment,,Residential,96,100,570000,5937.5,36,Metal,13890107,5169614658
766134,Mobayee,East Azar,Qeshm,Apartment,,Residential,144.5,100,1070000,7404.84,5,Concrete,13890108,5166884645
766140,Mobayee,East Azar,Tabriz,Apartment,,Residential,144.5,100,1050000,7266.44,5,Concrete,13890108,5166884645
766146,Mobayee,East Azar,Tabriz,Apartment,,Residential,144.5,100,700000,4844.29,5,Concrete,13890108,5166884645
766147,Mobayee,East Azar,Kish,Apartment,,Residential,144.5,100,1625000,11245.67,5,Concrete,13890108,5166884645
770822,Mobayee,East Azar,Tabriz,Apartment,,Residential,144.5,50,500000,1730.1,5,Concrete,13890114,5166884645

I would like to write a code to pass the first row as the header and then extract data from two specific cities (Kish and Qeshm) and save it into a new CSV file. Somthing like this one:

Contract Code,Contract Type,State,City,Property Type,Region,Usage Type,Area,Percentage,Price,Price per m2,Age,Frame Type,Contract Date,Postal Code
765720,Mobayee,East Azar,Kish,Apartment,,Residential,96,100,570000,5937.5,36,Metal,13890107,5169614658
766134,Mobayee,East Azar,Qeshm,Apartment,,Residential,144.5,100,1070000,7404.84,5,Concrete,13890108,5166884645
766147,Mobayee,East Azar,Kish,Apartment,,Residential,144.5,100,1625000,11245.67,5,Concrete,13890108,5166884645

It's worth mentioning that I'm very new to python. I've written the following block to define the headers, but this is the furthest I've gotten so far.

import pandas as pd

path = '/Users/Desktop/sample.csv'

df = pd.read_csv(path , header=[0])
df.head = ()

CodePudding user response:

You don't need to use header=... because the default is to treat the first row as the header, so

df = pd.read_csv(path)

Then, to keep rows on conditions:

df2 = df[df['City'].isin(['Kish', 'Qeshm'])]

And you can save it with

df2.to_csv(another_path)
  • Related