Home > Net >  Convert Text file w/ commas into 1 Column CSV
Convert Text file w/ commas into 1 Column CSV

Time:11-20

Goal: convert text file, into a 1 column .csv.

I was following this tutorial. However, my text file contains commas.

Each entry is separated by a new line, which I want to be a record in output: ESG_BENEFITS.csv.

How can I instruct my code to read each line in .txt as a new record to be, without


Code:

import pandas as pd

# readinag given csv file
# and creating dataframe
esg_benefits = pd.read_csv("ESG_BENEFITS.txt")  # delim = New Line

esg_benefits.to_csv('ESG_BENEFITS.csv', index=None, )  # delim = New Line

ParserError:

---------------------------------------------------------------------------
ParserError                               Traceback (most recent call last)
<ipython-input-1-f367126381e7> in <module>
      3 # readinag given csv file
      4 # and creating dataframe
----> 5 dataframe1 = pd.read_csv("ESG BENEFITS.txt")
      6 
      7 # storing this dataframe in a csv file

~\Anaconda3\lib\site-packages\pandas\io\parsers.py in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options)
    608     kwds.update(kwds_defaults)
    609 
--> 610     return _read(filepath_or_buffer, kwds)
    611 
    612 

~\Anaconda3\lib\site-packages\pandas\io\parsers.py in _read(filepath_or_buffer, kwds)
    466 
    467     with parser:
--> 468         return parser.read(nrows)
    469 
    470 

~\Anaconda3\lib\site-packages\pandas\io\parsers.py in read(self, nrows)
   1055     def read(self, nrows=None):
   1056         nrows = validate_integer("nrows", nrows)
-> 1057         index, columns, col_dict = self._engine.read(nrows)
   1058 
   1059         if index is None:

~\Anaconda3\lib\site-packages\pandas\io\parsers.py in read(self, nrows)
   2059     def read(self, nrows=None):
   2060         try:
-> 2061             data = self._reader.read(nrows)
   2062         except StopIteration:
   2063             if self._first_chunk:

pandas\_libs\parsers.pyx in pandas._libs.parsers.TextReader.read()

pandas\_libs\parsers.pyx in pandas._libs.parsers.TextReader._read_low_memory()

pandas\_libs\parsers.pyx in pandas._libs.parsers.TextReader._read_rows()

pandas\_libs\parsers.pyx in pandas._libs.parsers.TextReader._tokenize_rows()

pandas\_libs\parsers.pyx in pandas._libs.parsers.raise_parser_error()

ParserError: Error tokenizing data. C error: Expected 1 fields in line 9, saw 4

ESG_BENEFITS.txt:

Life insurance
Accident insurance
Adoption or fertility assistance programs
Disability/invalidity insurance
Mortgages and loans
Pension plans/retirement provision
Maternity and/or paternity leave
Child care
Job security initiatives for redeployment, including retraining, relocation, work-sharing and outplacement services
Flexible workschemes and work-sharing
Recall rights for laid-off employees
Stock ownership
Vacation
Paid sick days
PTO (including any of the following: unspecified, vacation and/or sick days)
Insurance: Healthcare Employee
Insurance: Healthcare Family
Insurance: Healthcare Domestic Partner
Insurance: Dental
Insurance: Vision
Insurance: AD&D
Insurance: Short Term Disability
Insurance: Long Term Disability
Employee Assistance Program
Education Benefits: Employee
Education Benefits: Family
Sabbatical Program
Relocation Assistance
Work/Life Support Program
Wellness/Fitness Program
Onsite Fitness Facilities
Onsite Recreation Facilities
Stock Options
Stock Purchase Plan
Employee Profit Sharing
Retirement: Defined Benefit Plan (including pension plans)
Childcare: Other
Bereavement Leave
Tuition reimbursement (other than career training)
Gym facilities or gym fee reimbursement programs
Higher education scholarship programs, for either employees or their relatives
Preventative healthcare programs
Flex scheduling
Telecommuting options
Public transportation subsidy
Carpooling support programs
Employee recognition programs
Paid time off for employee volunteers
Workforce training, skills, and leadership development programs
Matching gift program
Mentoring Program
Others
No additional benefits offered

Please let me know if there is anything else I can add to post.

CodePudding user response:

If you set your separator to something other than ,, something that isn't contained in the file, this should parse.

import pandas as pd

esg_benefits = pd.read_csv("ESG_BENEFITS.txt", sep='§')
esg_benefits.to_csv('ESG_BENEFITS.csv', index=None)

https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html

  • Related