Home > front end >  Pandas: how to read data as np.array
Pandas: how to read data as np.array

Time:11-12

I have a .tsv file like this:

sequences label
[[0.0, 1.0, 0.0, 0.0], [0.0, 0.0, 1.0, 0.0],[0.0, 0.0, 1.0, 0.0],[0.0, 0.0, 1.0, 0.0],[0.0, 0.0, 1.0, 0.0],[0.0, 0.0, 1.0, 0.0]] 1

I want to import the column sequences in pd.DataFrame as np.float64.

But it turns out like this:

df = pd.read_csv('AARS.tsv', sep='\t', dtype = np.float64)

ValueError: could not convert string to float 

I would be grateful if you can give me any suggestions!

Many thanks!

CodePudding user response:

Your first column does not look like it is a float64.

You could leave out the dtype=..., and check the type of the data:

import pandas as pd 
import numpy as np

df = pd.read_csv('aars.tsv', sep='\t', usecols=['label','sequence'])

for item in df.values:
    for i in range(item.size):
       print(type(item[i]), end=" ")
    print()

This will output something like (when I created your input correct, I added a line with column titles):

<class 'str'> <class 'int'>

CodePudding user response:

Here is a proposition using some of the pandas StringMethods and pandas.Series.explode :

import pandas as pd

out= (
        pd.read_csv("AARS.tsv", sep="\t", usecols=["sequences"])
            .assign(temp= lambda x: x["sequences"].str.strip("[]")
                                                  .str.replace("\]\s*,\s*\[", ", ",
                                                               regex=True)
                                                  .str.split(","))
                                                  .explode("temp")
                                                  .astype(float)
                                                  .values
     )

# Output:

print(out)

[[0.]
 [1.]
 [0.]
 [0.]
 [0.]
 [0.]
 [1.]
 [0.]
 [0.]
 [0.]
 [1.]
 [0.]
 [0.]
 [0.]
 [1.]
 [0.]
 [0.]
 [0.]
 [1.]
 [0.]
 [0.]
 [0.]
 [1.]
 [0.]]

print(type(out)

numpy.ndarray

If you need to reshape your array to 2D, use numpy.reshape :

print(np.reshape(out, (-1, 2)))

[[0. 1.]
 [0. 0.]
 [0. 0.]
 [1. 0.]
 [0. 0.]
 [1. 0.]
 [0. 0.]
 [1. 0.]
 [0. 0.]
 [1. 0.]
 [0. 0.]
 [1. 0.]]
  • Related