My objective is to read from one csv file and write it to another csv file. I have a csv file whose header contains x number of columns. Some of the rows have x 1, x 2, x-1 number of values in them(some of them are empty). I want to iterate through each row, find out the number of values, and if the number of values is not x, then I want to skip the row, and move onto the next one.
#this for loop iterates through multiple csv files in a folder and applies the specified changes
for filename in os.listdir("csv files"):
filenames=os.path.join("csv files",filename)
print(filenames)
df=pd.read_csv(filenames)
for row in df:
#used the below line to rename the column names
df.columns=["id","FirstName","LastName","UserName","Phone","IsContact","RestrictionReason","Status","IsScam","Date"]
#removed the Status column
df.pop("Status")
#used the line below to reorder the arrangement of columns in the dataframe
df = df.reindex(columns=['id', 'Phone', 'FirstName', 'LastName', 'UserName',"IsContact","IsScam","Date","RestrictionReason"])
df.to_csv(filenames,index=False)
As you can see there are 10 columns at the end, after I remove the Status
column and it's values/ But some of the data has 11 or 12 values. I don't want them in my new csv file, so I want to skip those specific rows but I don't know how to do that.
Here are the first 5 values including the header of the dataframe:
id Phone FirstName LastName UserName IsContact IsScam Date RestrictionReason
0 MT103 WIRE TRANSFER 9.477897e 10 Rooban Naan NaN False False 5/5/2022 11:51:37 PM NaN
1 MT103 WIRE TRANSFER 9.199007e 11 Vbanna Corp Vbannacorp True False 5/5/2022 11:51:14 PM NaN
2 MT103 WIRE TRANSFER 9.197899e 11 Chennail B Party RamaRaoTadimeti True False 5/5/2022 11:51:14 PM NaN
3 MT103 WIRE TRANSFER 9.196008e 11 Sahai NaN JAS2777 True False 5/5/2022 11:51:14 PM NaN
4 MT103 WIRE TRANSFER 8.801818e 12 Md Shah Alam NaN shahalamtrading True False 5/5/2022 11:51:14 PM NaN
Ignore the left most column without a header, that's a result of me not taking care while writing the csv file.
CodePudding user response:
With Pandas 1.3.0 you can do:
df = pd.read_csv('your.csv', on_bad_lines='skip')
With Pandas 1.4.0 you can do more with a callable function see Pandas dataframe read_csv on bad data (New with 1.4.0) to get a deeper understanding of what you can do using a callable function along with on_bad_lines
.