Home > database >  reading dataframe from csv and array problems
reading dataframe from csv and array problems

Time:11-18

The application I use generates data in a dataframe which I need to use upon request.

It looks similar to this.

<class 'pandas.core.frame.DataFrame'>
             E         Gg        gnx2    J chs lwave J_ID
0    27.572025  82.308581    7.078391  3.0   1   [0]    1
1    46.387728  77.029548   58.112338  3.0   1   [0]    1
2    75.007554  82.087407    0.535442  3.0   1   [0]    1

Everything worked perfectly while I didn't try to use dataframes saved in separate files before. Because when I am trying to use the data after loading - I got errors about data types for the columns which contain arrays. (lvawe for example) is an array and when saved in csv the information about data type is lost.

#saving the data to csv
csv_filename = "ladder.csv"
ladder.to_csv(csv_filename)

So when loading a dataframe next time to use the data I can't get access to array elements like it should.

Because as I understand data in this column is loaded like string. After loading the data through load_csv I get this for the data types:

Unnamed: 0      int64
E             float64
Gg            float64
gnx2          float64
J             float64
chs             int64
lwave          object
J_ID            int64
dtype: object

How can I resolve this issue? How can I correctly load the data with the correct data type or maybe explicitly assign a data type to a column after loading?

CodePudding user response:

In the read_csv function, you can manually assign data types to your new columns. Pass in a dictionary of column name --> preferred data type.

data_type_mapping = {‘a’: np.float64, ‘b’: np.int32, ‘c’: ‘Int64’}
my_df = pd.read_csv('myfile.csv', dtypes = data_type_mapping)

From pandas documentation:

Data type for data or columns. E.g. {‘a’: np.float64, ‘b’: np.int32, ‘c’: ‘Int64’} Use str or object together with suitable na_values settings to preserve and not interpret dtype. If converters are specified, they will be applied INSTEAD of dtype conversion.

CodePudding user response:

Tried to use this one, @StonedTensor,

Tried this one:

data_type_mapping = {
    'E': np.float64,
    'Gg': np.float64,
    'gnx2': np.float64,
    'J': np.float64,
    'chs': np.int64,
    'lwave': np.ndarray,
    'J_ID': np.int64
    }

filename = isotope_name   '_Emin_10_Emax_1000_2022.10.30_ladder.csv'
resonance_ladder =pd.read_csv(filename, dtype = data_type_mapping)

But I am having error :

TypeError: dtype '<class 'numpy.ndarray'>' not understood

The data in a dataframe is like this:

<class 'pandas.core.frame.DataFrame'>
              E         Gg       gnx2    J chs       lwave J_ID
0     17.795484  74.473014  16.796825  3.0   1         [0]    1
1     30.278961  81.597985   0.000036  3.0   1         [0]    1
2     38.730859  75.673462   0.732475  3.0   1         [0]    1
3     51.851538   75.30383  13.193803  3.0   1         [0]    1
4     65.971508   81.94329  11.063129  3.0   1         [0]    1
..          ...        ...        ...  ...  ..         ...  ...
165  918.513202  57.765912   0.380853 -4.0   2  [1.0, 1.0]    3

what am I doing wrong?

  • Related