Hi(sorry if this is a dump question)..i have a data set as CSV file ...every row contains 44 column and every cell containes 44 float number separated by two spaces like this(look at the screenshot) ...i tried CSV readline/s numpy and non of them worked i want to take every row as a list with[1936] variable (44*44) and then combine the whole data set into 2d array ...my_data[n_of_samples][1936]
CodePudding user response:
so as stated by user ybl, this is not a CSV. It's not even close to being a CSV.
This means that you have to implement some processing to turn this into something useable. I put the screenshot through an OCR to extract the actual text values, but next time provide the input file. Screenshots of data are annoying to work with.
The processing you need to to is to find the start and end of the rows, using the [
and ]
characters respectively. Then you split this data with the basic string.split()
which doesn't care about the number of spaces.
Try the code below and see if that works for the input file.
rows = []
current_row = ""
with open("somefile.txt") as infile:
for line in infile.readlines():
cleaned = line.replace('"', '').replace("\n", " ")
if "]" in cleaned:
current_row = f"{current_row} {cleaned.split(']')[0]}"
rows.append(current_row.split())
current_row = ""
cleaned = cleaned.split(']')[1]
if "[" in cleaned:
cleaned = cleaned.split("[")[1]
current_row = f"{current_row} {cleaned}"
for row in rows:
print(len(row))
output
44
44
44
input file:
"[ 1.79619717e 04 1.09988207e 02 4.13270009e 01 1.72227906e 01
1.06178751e 01 5.20957856e 00 7.50891645e 00 4.57943370e 00
2.65572713e 00 2.96725867e-01 2.43040664e 00 1.32822091e 00
4.09853169e-01 1.18412873e 00 6.43398990e-01 1.23796528e 00
9.63975374e-02 2.95295579e-01 7.68998970e-01 4.98040980e-01
2.84036936e-01 1.76004564e-01 1.43527613e-01 1.64765236e-01
1.51171075e-01 1.02586637e-01 3.27835810e-02 1.21872869e-02
-7.59824907e-02 8.48217334e-02 7.29953754e-02 4.89750588e-02
5.89426950e-02 5.05485266e-02 2.34761263e-02 -2.41095452e-02
5.15952510e-02 1.39933210e-02 2.12354074e-02 3.40820680e-03
-2.57466949e-03 -1.06481222e-02 -8.35155410e-03 1.21653512e-12]","[-6.12189619e 02 1.03584744e 04 2.34417495e 02 7.01761526e 01
3.92495170e 01 1.81609738e 01 2.58114624e 01 1.52275550e 01
8.59676934e 00 9.45036161e-01 7.71943506e 00 4.17516432e 00
1.27920413e 00 3.68862368e 00 1.99582544e 00 3.82999035e 00
2.96068511e-01 9.06341796e-01 2.35621065e 00 1.52094079e 00
8.64565916e-01 5.34605108e-01 4.35456793e-01 4.99450615e-01
4.57778770e-01 3.10324997e-01 9.90860520e-02 3.68281889e-02
-2.29532895e-01 2.56108491e-01 2.20284123e-01 1.47727878e-01
1.77724506e-01 1.52350751e-01 7.07318164e-02 -7.26252404e-02
1.55364050e-01 4.21222079e-02 6.39113311e-02 1.02558665e-02
-7.74736016e-03 -3.20368093e-02 -2.51241082e-02 1.21653512e-12]","[-5.03959282e 02 -5.64452044e 02 7.90433958e 03 1.94146598e 02
1.06178751e 01 5.20957856e 00 7.50891645e 00 4.57943370e 00
2.65572713e 00 2.96725867e-01 2.43040664e 00 1.32822091e 00
4.09853169e-01 1.18412873e 00 6.43398990e-01 1.23796528e 00
9.63975374e-02 2.95295579e-01 7.68998970e-01 4.98040980e-01
2.84036936e-01 1.76004564e-01 1.43527613e-01 1.64765236e-01
1.51171075e-01 1.02586637e-01 3.27835810e-02 1.21872869e-02
-7.59824907e-02 8.48217334e-02 7.29953754e-02 4.89750588e-02
5.89426950e-02 5.05485266e-02 2.34761263e-02 -2.41095452e-02
5.15952510e-02 1.39933210e-02 2.12354074e-02 3.40820680e-03
-2.57466949e-03 -1.06481222e-02 -8.35155410e-03 1.21653512e-12]"
CodePudding user response:
The option is this:
import numpy as np
import csv
c = np.array([n_of_samples])
with open('cocacola_sick.csv') as f:
p = csv.reader(f) # read file as csv
for s in p:
a = ','.join(s) # concatenate all lines into one line
a = a.replace("\n", "") # remove line breaks
b = np.array(np.mat(a))
my_data = np.vstack((c,b))
print(my_data)