Home > OS >  How to convert text file into csv file with first brackets as column followed by columns for each co
How to convert text file into csv file with first brackets as column followed by columns for each co

Time:05-16

Each line in my text file is displayed as such:

['MC1006-4', '21374850', '36.12', '15.50', '', '', '', '', '', '', '', '17916'], Timecard-MC1006-4-20220509090149-Reported.csv, OlderVersion, 183

Since the values in the bracket list of terms can vary in the file I am trying to store all the values in the brackets as one column so it would be something like this:

First Column: ['MC1006-4', '21374850', '36.12', '15.50', '', '', '', '', '', '', '', '17916']
Second Column: Timecard-MC1006-4-20220509090149-Reported.csv
Third Column: OlderVersion
Fourth Column: 183

Right now when I convert the text file into a csv using this code

import pandas as pd
read_file = pd.read_csv(r"C:\\Users\\Bilal\\Report2.txt")
read_file.to_csv(r"C:\\Users\\Bilal\\ReportInCsv.csv", index= None)

it gives the following error for some reason. The number of comma separated values even those including in the bracket are the same (18 values) for the first 11 rows so I don't know where the error is coming from. Therefore I am trying to read the file such that it treats the first bracket value as one value and not as separate values. Any help would be appreciated!

Error tokenizing data. C error: Expected 15 fields in line 11, saw 18

CodePudding user response:

This is not correct CSV file so you will have to write own code to read it.

I would first split on ], (and eventually use json to convert left part to real list) and later I would split right part using ,

In example I use io.StringIO to simulate file in memory (so everyone can copy and test it) but you should use open()

text = '''['MC1006-4', '21374850', '36.12', '15.50', '', '', '', '', '', '', '', '17916'], Timecard-MC1006-4-20220509090149-Reported.csv, OlderVersion, 183
['MC1006-4', '21374850', '36.12', '15.50', '', '', '', '', '', '', '', '17916'], Timecard-MC1006-4-20220509090149-Reported.csv, OlderVersion, 183
['MC1006-4', '21374850', '36.12', '15.50', '', '', '', '', '', '', '', '17916'], Timecard-MC1006-4-20220509090149-Reported.csv, OlderVersion, 183'''

import io
import json
import pandas as pd

#f = open('Report2.txt')
f = io.StringIO(text)

rows = []

for line in f:
    line = line.rstrip('\n')

    parts = line.split('],')

    first = parts[0]   ']'
    first = json.loads(first.replace("'", '"'))

    rest = parts[1].split(',')

    rows.append( [first]   rest )

print(rows)

df = pd.DataFrame(rows, columns=['Record', 'Filename', 'Folder', 'Row'])

print(df.to_string())

CodePudding user response:

I was able to get around the issue using the following code taken from another post online:

import pandas as pd
import io
import re

# regular expression to capture contents of balanced brackets
location_regex = re.compile(r'\[([^\[\]] )\]')

with open(r"C:\\Users\\Bilal\\Report2.txt", 'r') as fi:
    # replaced brackets with quotes, pipe into file-like object
    fo = io.StringIO()
    fo.writelines(str(re.sub(location_regex, r'"\1"', line)) for line in fi)

    # rewind file to the beginning
    fo.seek(0)

# read transformed CSV into data frame
df = pd.read_csv(fo)

df.columns = ['Record', 'Filename', 'Folder', "Row"]
# print(df)
df.to_csv('Records2Arranged.csv')

Thank you for the replies!

  • Related