How to convert text file into csv file with first brackets as column followed by columns for each co-CodePudding

Each line in my text file is displayed as such:

['MC1006-4', '21374850', '36.12', '15.50', '', '', '', '', '', '', '', '17916'], Timecard-MC1006-4-20220509090149-Reported.csv, OlderVersion, 183

Since the values in the bracket list of terms can vary in the file I am trying to store all the values in the brackets as one column so it would be something like this:

First Column: ['MC1006-4', '21374850', '36.12', '15.50', '', '', '', '', '', '', '', '17916']
Second Column: Timecard-MC1006-4-20220509090149-Reported.csv
Third Column: OlderVersion
Fourth Column: 183

Right now when I convert the text file into a csv using this code

import pandas as pd
read_file = pd.read_csv(r"C:\\Users\\Bilal\\Report2.txt")
read_file.to_csv(r"C:\\Users\\Bilal\\ReportInCsv.csv", index= None)

it gives the following error for some reason. The number of comma separated values even those including in the bracket are the same (18 values) for the first 11 rows so I don't know where the error is coming from. Therefore I am trying to read the file such that it treats the first bracket value as one value and not as separate values. Any help would be appreciated!

Error tokenizing data. C error: Expected 15 fields in line 11, saw 18

CodePudding user response：

This is not correct CSV file so you will have to write own code to read it.

I would first split on ], (and eventually use json to convert left part to real list) and later I would split right part using ,

In example I use io.StringIO to simulate file in memory (so everyone can copy and test it) but you should use open()

text = '''['MC1006-4', '21374850', '36.12', '15.50', '', '', '', '', '', '', '', '17916'], Timecard-MC1006-4-20220509090149-Reported.csv, OlderVersion, 183
['MC1006-4', '21374850', '36.12', '15.50', '', '', '', '', '', '', '', '17916'], Timecard-MC1006-4-20220509090149-Reported.csv, OlderVersion, 183
['MC1006-4', '21374850', '36.12', '15.50', '', '', '', '', '', '', '', '17916'], Timecard-MC1006-4-20220509090149-Reported.csv, OlderVersion, 183'''

import io
import json
import pandas as pd

#f = open('Report2.txt')
f = io.StringIO(text)

rows = []

for line in f:
    line = line.rstrip('\n')

    parts = line.split('],')

    first = parts[0]   ']'
    first = json.loads(first.replace("'", '"'))

    rest = parts[1].split(',')

    rows.append( [first]   rest )

print(rows)

df = pd.DataFrame(rows, columns=['Record', 'Filename', 'Folder', 'Row'])

print(df.to_string())

CodePudding user response：

I was able to get around the issue using the following code taken from another post online:

import pandas as pd
import io
import re

# regular expression to capture contents of balanced brackets
location_regex = re.compile(r'\[([^\[\]] )\]')

with open(r"C:\\Users\\Bilal\\Report2.txt", 'r') as fi:
    # replaced brackets with quotes, pipe into file-like object
    fo = io.StringIO()
    fo.writelines(str(re.sub(location_regex, r'"\1"', line)) for line in fi)

    # rewind file to the beginning
    fo.seek(0)

# read transformed CSV into data frame
df = pd.read_csv(fo)

df.columns = ['Record', 'Filename', 'Folder', "Row"]
# print(df)
df.to_csv('Records2Arranged.csv')

Thank you for the replies!