Home > Back-end >  Omitting rows that are larger than the specified length in a csv file using pandas
Omitting rows that are larger than the specified length in a csv file using pandas

Time:05-07

My objective is to read from one csv file and write it to another csv file. I have a csv file whose header contains x number of columns. Some of the rows have x 1, x 2, x-1 number of values in them(some of them are empty). I want to iterate through each row, find out the number of values, and if the number of values is not x, then I want to skip the row, and move onto the next one.

#this for loop iterates through multiple csv files in a folder and applies the specified changes
for filename in os.listdir("csv files"):
    
    filenames=os.path.join("csv files",filename)
    print(filenames)
   
    df=pd.read_csv(filenames)
        
    for row in df:

        #used the below line to rename the column names
        df.columns=["id","FirstName","LastName","UserName","Phone","IsContact","RestrictionReason","Status","IsScam","Date"]

        #removed the Status column
        df.pop("Status")

        #used the line below to reorder the arrangement of columns in the dataframe
        df = df.reindex(columns=['id', 'Phone', 'FirstName', 'LastName', 'UserName',"IsContact","IsScam","Date","RestrictionReason"])

        df.to_csv(filenames,index=False)

As you can see there are 10 columns at the end, after I remove the Status column and it's values/ But some of the data has 11 or 12 values. I don't want them in my new csv file, so I want to skip those specific rows but I don't know how to do that.
Here are the first 5 values including the header of the dataframe:

                       id         Phone        FirstName            LastName         UserName  IsContact  IsScam                  Date  RestrictionReason
0     MT103 WIRE TRANSFER  9.477897e 10           Rooban                Naan              NaN      False   False  5/5/2022 11:51:37 PM                NaN
1     MT103 WIRE TRANSFER  9.199007e 11           Vbanna                Corp       Vbannacorp       True   False  5/5/2022 11:51:14 PM                NaN
2     MT103 WIRE TRANSFER  9.197899e 11       Chennail B               Party  RamaRaoTadimeti       True   False  5/5/2022 11:51:14 PM                NaN
3     MT103 WIRE TRANSFER  9.196008e 11            Sahai                 NaN          JAS2777       True   False  5/5/2022 11:51:14 PM                NaN
4     MT103 WIRE TRANSFER  8.801818e 12     Md Shah Alam                 NaN  shahalamtrading       True   False  5/5/2022 11:51:14 PM                NaN

Ignore the left most column without a header, that's a result of me not taking care while writing the csv file.

CodePudding user response:

With Pandas 1.3.0 you can do:

df = pd.read_csv('your.csv', on_bad_lines='skip') 

With Pandas 1.4.0 you can do more with a callable function see Pandas dataframe read_csv on bad data (New with 1.4.0) to get a deeper understanding of what you can do using a callable function along with on_bad_lines.

  • Related