Trying to figure out the code to remove the rows in csv file where in column Date there is date starting with 202110 (and any day). So all rows from October should be removed. Then I want to save csv with orginal name 'updated'. I think that both part where I am trying to remove row is incorrect and save the file. Could you help?
My current code is
import os
import glob
import pandas as pd
from pathlib import Path
sourcefiles = source_files = sorted(Path(r'/Users/path/path/path').glob('*.csv'))
for file in sourcefiles:
df = pd.read_csv(file)
df2 = df[~df.Date.str.contains('202110')]
df2.to_csv("Updated.csv") # How to save with orginal file name word "updated"
CodePudding user response:
As you use pathlib
, you can use file.parent
and file.stem
:
Replace:
df2.to_csv("Updated.csv")
By:
df2.to_csv(file.parent / f"{file.stem}_updated.csv"))
CodePudding user response:
You can do something like this:
for file in sourcefiles:
df = pd.read_csv(file)
df.Date = pd.to_datetime(df.Date)
condition = ~((df.Date.dt.year == 2021) & (df.Date.dt.month == 10))
df_new = df.loc[condition]
name, ext = file.name.split('.')
df.to_csv(f'{name}_updated.{ext}')
This is assuming you have one dot in your filenames.