import pandas as pd
sea_level_df = pd.read_csv(r"C:\Users\slaye\OneDrive\Desktop\SeaLevel.csv")
display(sea_level_df)
I'm trying to delete the first 3 rows of this file without literally highlighting the unwanted text in the actual file and pressing backspace. Is there a way I can do this in python?
this is the top of the csv file:
#title = mean sea level anomaly global ocean (66S to 66N) (Annual signals retained)
#institution = NOAA/Laboratory for Satellite Altimetry
#references = NOAA Sea Level Rise
year,TOPEX/Poseidon,Jason-1,Jason-2,Jason-3
1992.9614,-16.27000,
1992.9865,-17.97000,
1993.0123,-14.87000,
1993.0407,-19.87000,
1993.0660,-25.27000,
1993.0974,-29.37000,
I want to delete the first 3 hashed rows of text so I can parse this into a table in pandas. I'm getting the following error:
ParserError Traceback (most recent call last)
Input In [14], in <cell line: 2>()
1 import pandas as pd
----> 2 sea_level_df = pd.read_csv(r"C:\Users\slaye\OneDrive\Desktop\SeaLevel.csv")
3 display(sea_level_df)
File ~\anaconda3\lib\site-packages\pandas\util\_decorators.py:311, in deprecate_nonkeyword_arguments.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
305 if len(args) > num_allow_args:
306 warnings.warn(
307 msg.format(arguments=arguments),
308 FutureWarning,
309 stacklevel=stacklevel,
310 )
--> 311 return func(*args, **kwargs)
File ~\anaconda3\lib\site-packages\pandas\io\parsers\readers.py:680, in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, error_bad_lines, warn_bad_lines, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options)
665 kwds_defaults = _refine_defaults_read(
666 dialect,
667 delimiter,
(...)
676 defaults={"delimiter": ","},
677 )
678 kwds.update(kwds_defaults)
--> 680 return _read(filepath_or_buffer, kwds)
File ~\anaconda3\lib\site-packages\pandas\io\parsers\readers.py:581, in _read(filepath_or_buffer, kwds)
578 return parser
580 with parser:
--> 581 return parser.read(nrows)
File ~\anaconda3\lib\site-packages\pandas\io\parsers\readers.py:1254, in TextFileReader.read(self, nrows)
1252 nrows = validate_integer("nrows", nrows)
1253 try:
-> 1254 index, columns, col_dict = self._engine.read(nrows)
1255 except Exception:
1256 self.close()
File ~\anaconda3\lib\site-packages\pandas\io\parsers\c_parser_wrapper.py:225, in CParserWrapper.read(self, nrows)
223 try:
224 if self.low_memory:
--> 225 chunks = self._reader.read_low_memory(nrows)
226 # destructive to chunks
227 data = _concatenate_chunks(chunks)
File ~\anaconda3\lib\site-packages\pandas\_libs\parsers.pyx:805, in pandas._libs.parsers.TextReader.read_low_memory()
File ~\anaconda3\lib\site-packages\pandas\_libs\parsers.pyx:861, in pandas._libs.parsers.TextReader._read_rows()
File ~\anaconda3\lib\site-packages\pandas\_libs\parsers.pyx:847, in pandas._libs.parsers.TextReader._tokenize_rows()
File ~\anaconda3\lib\site-packages\pandas\_libs\parsers.pyx:1960, in pandas._libs.parsers.raise_parser_error()
ParserError: Error tokenizing data. C error: Expected 1 fields in line 4, saw 5
CodePudding user response:
From the read_csv
documentation, you can use skiprows = 3
to ignore the first 3 rows of the file.
Otherwise, pandas
just reads your csv from the top down and assumes that all rows will follow the pattern of the first row. It doesn't see any delimiters (comma, tab, etc.) in the first row, so it assumes your data only has one column. The next few rows follow this same pattern (no delimiters = 1 column), then all of a sudden, there's a comma in the 4th row! Pandas
sees this as a delimiter (which would indicate more than one column), but since there weren't any in the first rows, it thinks there should only be one column for the whole csv, so it throws the error.