Goal: convert text file, into a 1 column .csv.
I was following this tutorial. However, my text file contains commas.
Each entry is separated by a new line, which I want to be a record in output: ESG_BENEFITS.csv
.
How can I instruct my code to read each line in .txt as a new record to be, without
Code:
import pandas as pd
# readinag given csv file
# and creating dataframe
esg_benefits = pd.read_csv("ESG_BENEFITS.txt") # delim = New Line
esg_benefits.to_csv('ESG_BENEFITS.csv', index=None, ) # delim = New Line
ParserError:
---------------------------------------------------------------------------
ParserError Traceback (most recent call last)
<ipython-input-1-f367126381e7> in <module>
3 # readinag given csv file
4 # and creating dataframe
----> 5 dataframe1 = pd.read_csv("ESG BENEFITS.txt")
6
7 # storing this dataframe in a csv file
~\Anaconda3\lib\site-packages\pandas\io\parsers.py in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options)
608 kwds.update(kwds_defaults)
609
--> 610 return _read(filepath_or_buffer, kwds)
611
612
~\Anaconda3\lib\site-packages\pandas\io\parsers.py in _read(filepath_or_buffer, kwds)
466
467 with parser:
--> 468 return parser.read(nrows)
469
470
~\Anaconda3\lib\site-packages\pandas\io\parsers.py in read(self, nrows)
1055 def read(self, nrows=None):
1056 nrows = validate_integer("nrows", nrows)
-> 1057 index, columns, col_dict = self._engine.read(nrows)
1058
1059 if index is None:
~\Anaconda3\lib\site-packages\pandas\io\parsers.py in read(self, nrows)
2059 def read(self, nrows=None):
2060 try:
-> 2061 data = self._reader.read(nrows)
2062 except StopIteration:
2063 if self._first_chunk:
pandas\_libs\parsers.pyx in pandas._libs.parsers.TextReader.read()
pandas\_libs\parsers.pyx in pandas._libs.parsers.TextReader._read_low_memory()
pandas\_libs\parsers.pyx in pandas._libs.parsers.TextReader._read_rows()
pandas\_libs\parsers.pyx in pandas._libs.parsers.TextReader._tokenize_rows()
pandas\_libs\parsers.pyx in pandas._libs.parsers.raise_parser_error()
ParserError: Error tokenizing data. C error: Expected 1 fields in line 9, saw 4
ESG_BENEFITS.txt
:
Life insurance
Accident insurance
Adoption or fertility assistance programs
Disability/invalidity insurance
Mortgages and loans
Pension plans/retirement provision
Maternity and/or paternity leave
Child care
Job security initiatives for redeployment, including retraining, relocation, work-sharing and outplacement services
Flexible workschemes and work-sharing
Recall rights for laid-off employees
Stock ownership
Vacation
Paid sick days
PTO (including any of the following: unspecified, vacation and/or sick days)
Insurance: Healthcare Employee
Insurance: Healthcare Family
Insurance: Healthcare Domestic Partner
Insurance: Dental
Insurance: Vision
Insurance: AD&D
Insurance: Short Term Disability
Insurance: Long Term Disability
Employee Assistance Program
Education Benefits: Employee
Education Benefits: Family
Sabbatical Program
Relocation Assistance
Work/Life Support Program
Wellness/Fitness Program
Onsite Fitness Facilities
Onsite Recreation Facilities
Stock Options
Stock Purchase Plan
Employee Profit Sharing
Retirement: Defined Benefit Plan (including pension plans)
Childcare: Other
Bereavement Leave
Tuition reimbursement (other than career training)
Gym facilities or gym fee reimbursement programs
Higher education scholarship programs, for either employees or their relatives
Preventative healthcare programs
Flex scheduling
Telecommuting options
Public transportation subsidy
Carpooling support programs
Employee recognition programs
Paid time off for employee volunteers
Workforce training, skills, and leadership development programs
Matching gift program
Mentoring Program
Others
No additional benefits offered
Please let me know if there is anything else I can add to post.
CodePudding user response:
If you set your separator to something other than ,
, something that isn't contained in the file, this should parse.
import pandas as pd
esg_benefits = pd.read_csv("ESG_BENEFITS.txt", sep='§')
esg_benefits.to_csv('ESG_BENEFITS.csv', index=None)
https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html