Home > Net >  Value error while writing from one csv file to another using pandas
Value error while writing from one csv file to another using pandas

Time:05-09

I am writing a code that goes through many csv files in a folder(using a for loop), removes bad data from each csv file(where row values are more than number of columns or sometimes lesser than number of columns). After removing, I rearrange the columns and then I write the useful data into a new csv file.
Here in the code below the for loop is for cycling between different files present in a folder. You can assume the df=pd.read_csv line as the beginning and assume correct indentation.

import pandas as pd
import os

for filename in os.listdir("csv files copy"):
    filenames=os.path.join("csv files copy",filename)
    print(filename)
   
    df=pd.read_csv(filenames, error_bad_lines=False)

    for row in df:

        df.columns=["id","FirstName","LastName","UserName","Phone","IsContact","RestrictionReason","Status","IsScam","Date"]
        df = df.drop(labels="Status", axis=1)
        df = df.reindex(columns=['id', 'Phone', 'FirstName', 'LastName', 'UserName',"IsContact","IsScam","Date","RestrictionReason"])
        df.to_csv(filenames,index=False)

While doing so this is the error I recieve.
ValueError: Length mismatch: Expected axis has 9 elements, new values have 10 elements

This is the first 4 values and the header of the dataframe that I am using:

id                      Phone   FirstName   LastName   UserName     IsContact  IsScam Date                       RestrictionReason        Status             
Forex Pips Fire Free    NaN     Goldenboy      NaN     Goldenboyys      False   False  5/7/2022 8:34:07 AM                NaN             NaN
Forex Pips Fire Free    NaN     Abu 3odeh      NaN         oudah12      False   False  5/7/2022 8:38:03 AM                NaN             NaN
Forex Pips Fire Free    NaN        Rahman     Azar     Rahman_Azar      False   False  5/7/2022 8:41:22 AM                NaN             NaN
Forex Pips Fire Free    NaN         HUDLE      NaN       Hudle1051      False   False  5/7/2022 8:41:11 AM                NaN             NaN

And given below is the header of the destination csv file that the above data needs to be entered into

id Phone FirstName LastName UserName IsContact IsScam Date RestrictionReason

CodePudding user response:

Because you give only 9 columns in this line you missed the 'Status' column

df = df.reindex(columns=['id', 'Phone', 'FirstName', 'LastName', 'UserName', 'IsContact', 'IsScam', 'Date', 'RestrictionReason'])
df.to_csv(filenames, index=False)

CodePudding user response:

You need to remove the for loop as follows:

import pandas as pd
import os

for filename in os.listdir("csv files copy"):
    filenames = os.path.join("csv files copy", filename)
    print(filename)
   
    df = pd.read_csv(filenames, error_bad_lines=False)
    df.columns = ["id", "FirstName", "LastName", "UserName", "Phone", "IsContact", "RestrictionReason", "Status", "IsScam", "Date"]
    df = df.drop(labels="Status", axis=1)
    df = df.reindex(columns=["id", "Phone", "FirstName", "LastName", "UserName","IsContact","IsScam","Date","RestrictionReason"])
    df.to_csv(filenames, index=False)

This was causing the error and is not needed. The first time through the loop it correctly removes Status column and saves the CSV file. The second time through the loop (on the same dataframe) it attempts to do df.columns again but now there is no Status column, so an incorrect number of columns are given.

The code for row in df: would actually iterate over the column names in the dataframe,
e.g. id then FirstName etc.

  • Related