Home > Back-end >  Delete CSV file if missing specific column using python
Delete CSV file if missing specific column using python

Time:09-03

Currently my code looks into CSV files in a folder and replaces strings based on if the file has column 'PROD_NAME' in the data. If it doesnt have column 'PROD_NAME', I'm trying to delete those files in the folder. I can get my code to print which csv files do not have the column with a little debugging, but I cant figure out how to actually delete or remove them from the folder they are in. I have tried an if statement that calls os.remove() and still nothing happens. No errors or anything.. it just finishes the script with all the files still in the folder. Here is my code. Any help is appreciated. Thanks!

def worker():
    filenames = glob.glob(dest_dir   '\\*.csv')
    print("Finding all files with column PROD_NAME")
    time.sleep(3)
    print("Changing names of products in these tables...")
    for filename in filenames:
        
        my_file = Path(os.path.join(dest_dir, filename))
        
        try:
            with open(filename):
            # read data
                df1 = pd.read_csv(filename, skiprows=1, encoding='ISO-8859-1') # read column header only - to get the list of columns
                dtypes = {}
                for col in df1.columns:# make all columns text, to avoid formatting errors
                    dtypes[col] = 'str'
                df1 = pd.read_csv(filename, dtype=dtypes, skiprows=1, encoding='ISO-8859-1')

                if 'PROD_NAME' not in df1.columns:
                os.remove(filename)
                    
                #Replaces text in files
                if 'PROD_NAME' in df1.columns: 
                    df1 = df1.replace("NABVCI", "CLEAR_BV")
                    df1 = df1.replace("NAMVCI", "CLEAR_MV")
                    df1 = df1.replace("NA_NRF", "FA_GUAR")
                    df1 = df1.replace("N_FPFA", "FA_FLEX")
                    df1 = df1.replace("NAMRFT", "FA_SECURE_MVA")
                    df1 = df1.replace("NA_RFT", "FA_SECURE")
                    df1 = df1.replace("NSPFA7", "FA_PREFERRED")
                    df1 = df1.replace("N_ENHA", "FA_ENHANCE")
                    df1 = df1.replace("N_FPRA", "FA_FLEX_RETIRE")
                    df1 = df1.replace("N_SELF", "FA_SELECT")
                    df1 = df1.replace("N_SFAA", "FA_ADVANTAGE")
                    df1 = df1.replace("N_SPD1", "FA_SPD1")
                    df1 = df1.replace("N_SPD2", "FA_SPD2")
                    df1 = df1.replace("N_SPFA", "FA_LIFESTAGES")
                    df1 = df1.replace("N_SPPF", "FA_PLUS")
                    df1 = df1.replace("N__CFA", "FA_CHOICE")
                    df1 = df1.replace("N__OFA", "FA_OPTIMAL")
                    df1 = df1.replace("N_SCNI", "FA_SCNI")
                    df1 = df1.replace("NASCI_", "FA_SCI")
                    df1 = df1.replace("NASSCA", "FA_SSC")
                    df1.to_csv(filename, index=False, quotechar="'")            
                
        except:
            if 'PROD_NAME' in df1.columns:
                print("Could not find string to replace in this file: "   filename)
                    
worker()

CodePudding user response:

Written below is a block of code that reads the raw csv data. It extracts the first row of data (containing the column names) and looks for the column name PROD_NAME. If it finds it, it sets found to True. Else, it sets found to False. To prevent trying to delete the files whilst open, the removal is done outside of the open().


import os

filename = "test.csv"

with open(filename) as f:
    if "PROD_NAME" in f.readlines()[0].split(","):
        print("found")
        found = True
    else:
        print("not found")
        found = False
if not found:
    os.remove(filename)
else:
    pass#Carry out replacements here/load it in pandas
  • Related