Updated CSV Values across Multiple CSV Files-CodePudding

This is my code below and whenever I run my program, I receive and error stating "attribute error: 'generator' object has no attribute 'loc'"

I'm currently trying to change specified values in a specified column in all csv files to different specified values for the specified column

I'm not sure why this is happening

# Get CSV files list from a folder
csv_files = glob.glob(dest_dir   "/*.csv")

# Read each CSV file into DataFrame
# This creates a list of dataframes
df = (pd.read_csv(file) for file in csv_files)
df.loc[df['Plan_Code'].str.contains('NABVCI'), 'Plan_Code'] = 'CLEAR_BV'
df.loc[df['Plan_Code'].str.contains('NAMVCI'), 'Plan_Code'] = 'CLEAR_MV'
df.loc[df['Plan_Code'].str.contains('NA_NRF'), 'Plan_Code'] = 'FA_GUAR'

df.to_csv(csv_files, index=False)

Thanks!

CodePudding user response：

You wrote this:

df = (pd.read_csv(file) for file in csv_files)

Rather than that generator expression, you probably intended to write a list comprehension:

df = [pd.read_csv(file) for file in csv_files]

Additionally you likely want to call pd.concat(), so that multiple .CSVs get incorporated into a single dataframe.

Alternatively, you might prefer to build up a list of dicts pulled from csv.DictReader, and then call pd.DataFrame() on that list. Multiple .csv files could contribute rows to the list. One dict per row, without regard to which file the row appears in.

CodePudding user response：

Because you use round brackets and not square brackets when creating df, df becomes a generator object and not a list of dataframes. But even if you switch to square brackets you will still have a problem: df will now be a list, but lists don't have a loc attribute either, only dataframes -- individual elements of that list -- have it. So df.loc still wouldn't work.

If I understand your intent correctly, you want something like this instead:

csv_files = glob.glob(dest_dir   "/*.csv")
for file in csv_files:
    df = pd.read_csv(file) #now df is a dataframe, so df.loc makes sense
    #do your df.loc manipulations, then save each df to its own file
    df.to_csv(file, index=False)