Home > Blockchain >  Pandas read_csv is retriving different data than what is in the text file
Pandas read_csv is retriving different data than what is in the text file

Time:08-01

I have a .txt (notepad) file called Log1. It has the following saved in it: [1, 1, 1, 0]

When I write a program to retrieve the data:

Log1 = pd.read_csv('Path...\\Log1.txt')
Log1 = list(Log1)
print(Log1)

It prints: ['[1', ' 1', ' 1.1', ' 0]']

I dont understand where the ".1" is coming from on the third number. Its not in the text file, it just adds it.

Funny enough if I change the numbers in the text file to: [1, 0, 1, 1]. It does not add the .1 It prints ['[1', ' 0', ' 1', ' 1]']

Very odd why its acting this way if anyone has an idea.

CodePudding user response:

This should work. Can you please try this,

log2 = log1.values.tolist()

Output:

[['1'], ['1'], ['1'], ['0']]

CodePudding user response:

Your data is not in a CSV format. In CSV you would rather have

1;1;0;1

or something similar.

If you have multiple lines like this, it might make sense to parse this as CSV, otherwise I'd rather parse it using a regexp and .split on the result.

Proposal: Add a bigger input example and your expected output.

CodePudding user response:

Solved it with input from above. Its just pandas interpretation of the data that was messing up the output:

Log4 = []
with open('path...\\Log4.txt') as f:
Log4 = f.readlines()

prints ['[1, 1, 1, 0]']

CodePudding user response:

Well, I worked out some other options as well, just for the record:

Solution 1 (plain read - this one gets a list of string)

log4 = []
with open('log4.txt') as f:
    log4 = f.readlines()
print(log4)

Solution 2 (convert to list of ints)

import ast
with open('log4.txt', 'r') as f:
    inp = ast.literal_eval(f.read())
print(inp))

Solution 3 (old school string parsing - convert to list of ints, then put it in a dataframe)

with open('log4.txt', 'r') as f:
    mylist = f.read()

mylist = mylist.replace('[','').replace(']','').replace(' ','')
mylist = mylist.split(',')

df = pd.DataFrame({'Col1': mylist})
df['Col1'] = df['Col1'].astype(int)
print(df)

Other ideas here as well:

https://docs.python-guide.org/scenarios/serialization/

In general the reading from the text file (deserializing) is easier if the text file is written in a good structured format in the first place - csv file, pickle file, json file, etc. In this case, using the ast.literal_eval() worked well since this was written out as a list using it's __repr__ format -- though honestly I've never done that before so it was an interesting solution to me as well :)

  • Related