I have a list of integers saved in a csv sheet, the rows are not all the same length. Like the following example:
22,-14,-24,2,-26,18,20,-4,12,16,8,-6,-10
20,12,-16,18,28,24,4,-22,26,8,-10,-14,2,6
10,-26,-20,30,24,-22,18,-28,12,14,-6,-2,8,-16,-4
16, 22, 30, -18, -26, -28, 24, -8, 32, -14, 12, 4, 20, -10, 2, 6
32, 10, -14, 20, -22, 24, -4, -26, 34, 28, -30, 2, 12, 18, 6, -8, 16
8, -20, 34, 18, 30, 24, -4, 6, 28, -32, -12, -36, 10, 16, -38, 2, 14, -22, -26
I need to call a function where the input is an array consisting of one such row. So I need exactly the following.
input = [22,-14,-24,2,-26,18,20,-4,12,16,8,-6,-10]
Using the standard approach
import csv
with open(file.csv, 'r') as f:
reader = csv.reader(f)
for line in reader:
print(line)
yields the output
['22', '-14', '-24', '2', '-26', '18', '20', '-4', '12', '16', '8', '-6', '-10']
which I can't use since the elements are not integers. I have tried to use different formatting parameters, like csv.QUOTE_NONE
but nothing works. This makes sense as far as I know since csv files do not know integer data types.
My files have between 100'000-1'000'000 rows so any solution must be efficient. Since the number of columns is not fixed I also was not able to cast manually, I couldn't figure out how to loop through the columns of one row. Does anyone have an idea how I could solve this problem? I don't know if it could help but I am not bound to csv files, I could probably use something else.
CodePudding user response:
You can just convert them to int:
elems = ['22', '-14', '-24', '2', '-26', '18', '20', '-4', '12', '16', '8', '-6', '-10']
elems = [int(i) for i in elems]
Output:
[22, -14, -24, 2, -26, 18, 20, -4, 12, 16, 8, -6, -10]
The better handle the csv, you could also use Pandas:
import pandas as pd
df = pd.read_csv('line.csv', header=None, sep = ';')
df = df.T
for row, col in df.iteritems():
line = list(df[row].dropna())
print(line)
and the output is:
[22.0, -14.0, -24.0, 2.0, -26.0, 18.0, 20.0, -4.0, 12.0, 16.0, 8.0, -6.0, -10.0]
[20.0, 12.0, -16.0, 18.0, 28.0, 24.0, 4.0, -22.0, 26.0, 8.0, -10.0, -14.0, 2.0, 6.0]
[10.0, -26.0, -20.0, 30.0, 24.0, -22.0, 18.0, -28.0, 12.0, 14.0, -6.0, -2.0, 8.0, -16.0, -4.0]
[16.0, 22.0, 30.0, -18.0, -26.0, -28.0, 24.0, -8.0, 32.0, -14.0, 12.0, 4.0, 20.0, -10.0, 2.0, 6.0]
[32.0, 10.0, -14.0, 20.0, -22.0, 24.0, -4.0, -26.0, 34.0, 28.0, -30.0, 2.0, 12.0, 18.0, 6.0, -8.0, 16.0]
[8.0, -20.0, 34.0, 18.0, 30.0, 24.0, -4.0, 6.0, 28.0, -32.0, -12.0, -36.0, 10.0, 16.0, -38.0, 2.0, 14.0, -22.0, -26.0]
CodePudding user response:
As your CSV doesn't have any column names you don't really need the csv module (let alone pandas). You could just do this:
FILENAME = 'file.csv'
def parse(filename):
with open(filename) as data:
for line in data:
yield list(map(int, line.split(',')))
for line in parse(FILENAME):
print(line)
Output:
[22, -14, -24, 2, -26, 18, 20, -4, 12, 16, 8, -6, -10]
[20, 12, -16, 18, 28, 24, 4, -22, 26, 8, -10, -14, 2, 6]
[10, -26, -20, 30, 24, -22, 18, -28, 12, 14, -6, -2, 8, -16, -4]
[16, 22, 30, -18, -26, -28, 24, -8, 32, -14, 12, 4, 20, -10, 2, 6]
[32, 10, -14, 20, -22, 24, -4, -26, 34, 28, -30, 2, 12, 18, 6, -8, 16]
[8, -20, 34, 18, 30, 24, -4, 6, 28, -32, -12, -36, 10, 16, -38, 2, 14, -22, -26]