Each line in my text file is displayed as such:
['MC1006-4', '21374850', '36.12', '15.50', '', '', '', '', '', '', '', '17916'], Timecard-MC1006-4-20220509090149-Reported.csv, OlderVersion, 183
Since the values in the bracket list of terms can vary in the file I am trying to store all the values in the brackets as one column so it would be something like this:
First Column: ['MC1006-4', '21374850', '36.12', '15.50', '', '', '', '', '', '', '', '17916']
Second Column: Timecard-MC1006-4-20220509090149-Reported.csv
Third Column: OlderVersion
Fourth Column: 183
Right now when I convert the text file into a csv using this code
import pandas as pd
read_file = pd.read_csv(r"C:\\Users\\Bilal\\Report2.txt")
read_file.to_csv(r"C:\\Users\\Bilal\\ReportInCsv.csv", index= None)
it gives the following error for some reason. The number of comma separated values even those including in the bracket are the same (18 values) for the first 11 rows so I don't know where the error is coming from. Therefore I am trying to read the file such that it treats the first bracket value as one value and not as separate values. Any help would be appreciated!
Error tokenizing data. C error: Expected 15 fields in line 11, saw 18
CodePudding user response:
This is not correct CSV file so you will have to write own code to read it.
I would first split on ],
(and eventually use json
to convert left part to real list) and later I would split right part using ,
In example I use io.StringIO
to simulate file in memory (so everyone can copy and test it) but you should use open()
text = '''['MC1006-4', '21374850', '36.12', '15.50', '', '', '', '', '', '', '', '17916'], Timecard-MC1006-4-20220509090149-Reported.csv, OlderVersion, 183
['MC1006-4', '21374850', '36.12', '15.50', '', '', '', '', '', '', '', '17916'], Timecard-MC1006-4-20220509090149-Reported.csv, OlderVersion, 183
['MC1006-4', '21374850', '36.12', '15.50', '', '', '', '', '', '', '', '17916'], Timecard-MC1006-4-20220509090149-Reported.csv, OlderVersion, 183'''
import io
import json
import pandas as pd
#f = open('Report2.txt')
f = io.StringIO(text)
rows = []
for line in f:
line = line.rstrip('\n')
parts = line.split('],')
first = parts[0] ']'
first = json.loads(first.replace("'", '"'))
rest = parts[1].split(',')
rows.append( [first] rest )
print(rows)
df = pd.DataFrame(rows, columns=['Record', 'Filename', 'Folder', 'Row'])
print(df.to_string())
CodePudding user response:
I was able to get around the issue using the following code taken from another post online:
import pandas as pd
import io
import re
# regular expression to capture contents of balanced brackets
location_regex = re.compile(r'\[([^\[\]] )\]')
with open(r"C:\\Users\\Bilal\\Report2.txt", 'r') as fi:
# replaced brackets with quotes, pipe into file-like object
fo = io.StringIO()
fo.writelines(str(re.sub(location_regex, r'"\1"', line)) for line in fi)
# rewind file to the beginning
fo.seek(0)
# read transformed CSV into data frame
df = pd.read_csv(fo)
df.columns = ['Record', 'Filename', 'Folder', "Row"]
# print(df)
df.to_csv('Records2Arranged.csv')
Thank you for the replies!