Home > OS >  Search multiple CSVs for one particular value
Search multiple CSVs for one particular value

Time:11-30

I have a directory of CSV files that list fruits and there quantities, and I want to search all those and create a new CSV that only has a particular fruit (e.g., "apple").

list1.csv

| Name     | Qty |
| -------- | --- |
| apple    |15   | 
| apple    |50   |
| mango    |20   | 
| grapes   |49   |

list2.csv

| Name     | Qty |
| -------- | --- |
| apple    |25   | 
| apple    |50   |
| Banana   |34   |
| mango    |20   | 
| grapes   |49   |

list3.csv

| Name     | Qty |
| -------- | --- |
| apple    |125   | 
| apple    |530   |
| mango    |20   | 
| grapes   |49   |

I want, for "apple":

new.csv

| Name     | Qty |
| -------- | --- |
| apple    |15   | 
| apple    |50   |
| apple    |25   | 
| apple    |50   |
| apple    |125  | 
| apple    |530  |
import pandas as pd
import glob, os
path = ("E:/Data/Fdata")
all_files = glob.glob(path   "/*.csv")
li=[]
for filename in all_files:
    df=pd.read_csv(filename, index_col=None, header=0)
    ndf = df[df["Name"].str.contains("Apple")]
    li.append(ndf)
    ndf.to_csv("E:/Data/Fdata/onlyapple.csv", index=True)

CodePudding user response:

  1. Read all your csv files to a master DataFrame
  2. Filter on the "Name" you want and write to_csv
import os
import pandas as pd

master = pd.DataFrame()
for file in [f for f in os.listdir(".") if f.endswith("csv")]:
    master = master.append(pd.read_csv(file), ignore_index=True)
    
master[master["Name"].eq("apple")].reset_index(drop=True).to_csv("onlyapple.csv")
onlyapple.csv:
,Name,quantity,price
0,apple,15,500
1,apple,50,400
2,apple,15,500
3,apple,50,400

CodePudding user response:

Here's how to do it without Pandas:

import csv
import glob

fruit = 'apple'

final = []
header = []

for file in glob.glob('./*.csv'):
    with open(file, newline='') as f:
        reader = csv.reader(f)
        header = next(reader)  # header should be same for each file

        for row in reader:
            if row[0] == fruit:
                final.append(row)

with open('output.csv', 'w', newline='') as f:
    writer = csv.writer(f)
    writer.writerow(header)    # use the last file's header
    writer.writerows(final)

And here's how to do it without code using GoCSV's commands, stack (to stack your files one on top of each other) and filter (to filter out only the rows you want):

gocsv stack *.csv | gocsv filter -c Name -eq apple
  • Related