Home > OS >  How to delete rows that exceed a certain amount of columns
How to delete rows that exceed a certain amount of columns

Time:03-08

Working on putting csv files into a dataframe and loading it into a database, but there are a few files that have rows with data in extra columns. I want to be able to drop the entire row that exceeds 'X' amount of columns

Example, want to remove rows that have more than 4 columns

id, header1, header2, header3
1, desc1, desc2, desc3
2, desc1, desc2, desc3
3, desc1, desc2, desc3, desc4
4, desc1, desc2, desc3

I would like to skip row with id = 3, output:

id, header1, header2, header3
1, desc1, desc2, desc3
2, desc1, desc2, desc3
4, desc1, desc2, desc3

CodePudding user response:

Pandas can do it for you, use on_bad_lines='skip' as parameter of read_csv:

df = pd.read_csv('data.csv', on_bad_lines='skip')
print(df)

# Output
   id header1 header2 header3
0   1   desc1   desc2   desc3
1   2   desc1   desc2   desc3
2   4   desc1   desc2   desc3

Update

For older versions of Pandas, use:

df = pd.read_csv('data.csv', error_bad_lines=False)
print(df)

# Output
   id header1 header2 header3
0   1   desc1   desc2   desc3
1   2   desc1   desc2   desc3
2   4   desc1   desc2   desc3
  • Related