Remove rows from csv based on given criteria export updated csv with ne file name-CodePudding

Trying to figure out the code to remove the rows in csv file where in column Date there is date starting with 202110 (and any day). So all rows from October should be removed. Then I want to save csv with orginal name 'updated'. I think that both part where I am trying to remove row is incorrect and save the file. Could you help?

My current code is

import os
import glob
import pandas as pd
from pathlib import Path

sourcefiles = source_files = sorted(Path(r'/Users/path/path/path').glob('*.csv'))


for file in sourcefiles:
 df = pd.read_csv(file)
 df2 = df[~df.Date.str.contains('202110')] 
 df2.to_csv("Updated.csv") # How to save with orginal file name   word "updated"

CodePudding user response：

As you use pathlib, you can use file.parent and file.stem:

Replace:

df2.to_csv("Updated.csv")

By:

df2.to_csv(file.parent / f"{file.stem}_updated.csv"))

CodePudding user response：

You can do something like this:

for file in sourcefiles:
    df = pd.read_csv(file)
    df.Date = pd.to_datetime(df.Date)
    condition = ~((df.Date.dt.year == 2021) & (df.Date.dt.month == 10))
    df_new = df.loc[condition]

    name, ext = file.name.split('.')
    df.to_csv(f'{name}_updated.{ext}')

This is assuming you have one dot in your filenames.