How to keep top 500 rows a csv loop (python) and overwrite each file-CodePudding

I am trying to read more than 100 csv files in python to keep the TOP 500 rows (they each have more than 55,0000 rows). So far I know how to do that, but I need save each modified file in the loop with its own filename in csv format. because normally I can output the concatenated dataframe to one big csv file, but this time I need to basically truncate each csv file to only keep top 500 rows and save each.

this is the code I have had so far:

import pandas as pd
import glob

FolderName = str(input("What's the name of the folder are you comparing? "))
path = str(input('Enter full path of the folder: '))
#r'C:\Users\si\Documents\UST\AST' # use your path
all_files = glob.glob(path   "/*.csv")

#list1 = []
d = {}

for filename in all_files:
    df = pd.read_csv(filename, index_col=None, header=0, nrows=500)
    #list1.append(df)
    d[filename] = df.columns

#frame = pd.concat(list1, axis=0, ignore_index=True)
frame = pd.DataFrame.from_dict(d, orient='index')

output_path = r'C:\Users\si\Downloads\New\{}_header.xlsx'.format(FolderName)
frame.to_excel(output_path)

CodePudding user response：

Dataframes can write as well as read CSVs. So, just create and call to_csv with the same filename.

import pandas as pd
import glob

FolderName = str(input("What's the name of the folder are you comparing? "))
path = input('Enter full path of the folder: ')
all_files = glob.glob(path   "/*.csv")

for filename in all_files:
    pd.read_csv(filename, index_col=None, header=0, nrows=500).to_csv(filename)