The application I use generates data in a dataframe which I need to use upon request.
It looks similar to this.
<class 'pandas.core.frame.DataFrame'>
E Gg gnx2 J chs lwave J_ID
0 27.572025 82.308581 7.078391 3.0 1 [0] 1
1 46.387728 77.029548 58.112338 3.0 1 [0] 1
2 75.007554 82.087407 0.535442 3.0 1 [0] 1
Everything worked perfectly while I didn't try to use dataframes saved in separate files before. Because when I am trying to use the data after loading - I got errors about data types for the columns which contain arrays. (lvawe for example) is an array and when saved in csv the information about data type is lost.
#saving the data to csv
csv_filename = "ladder.csv"
ladder.to_csv(csv_filename)
So when loading a dataframe next time to use the data I can't get access to array elements like it should.
Because as I understand data in this column is loaded like string. After loading the data through load_csv I get this for the data types:
Unnamed: 0 int64
E float64
Gg float64
gnx2 float64
J float64
chs int64
lwave object
J_ID int64
dtype: object
How can I resolve this issue? How can I correctly load the data with the correct data type or maybe explicitly assign a data type to a column after loading?
CodePudding user response:
In the read_csv
function, you can manually assign data types to your new columns. Pass in a dictionary of column name --> preferred data type.
data_type_mapping = {‘a’: np.float64, ‘b’: np.int32, ‘c’: ‘Int64’}
my_df = pd.read_csv('myfile.csv', dtypes = data_type_mapping)
From pandas
documentation:
Data type for data or columns. E.g. {‘a’: np.float64, ‘b’: np.int32, ‘c’: ‘Int64’} Use str or object together with suitable na_values settings to preserve and not interpret dtype. If converters are specified, they will be applied INSTEAD of dtype conversion.
CodePudding user response:
Tried to use this one, @StonedTensor,
Tried this one:
data_type_mapping = {
'E': np.float64,
'Gg': np.float64,
'gnx2': np.float64,
'J': np.float64,
'chs': np.int64,
'lwave': np.ndarray,
'J_ID': np.int64
}
filename = isotope_name '_Emin_10_Emax_1000_2022.10.30_ladder.csv'
resonance_ladder =pd.read_csv(filename, dtype = data_type_mapping)
But I am having error :
TypeError: dtype '<class 'numpy.ndarray'>' not understood
The data in a dataframe is like this:
<class 'pandas.core.frame.DataFrame'>
E Gg gnx2 J chs lwave J_ID
0 17.795484 74.473014 16.796825 3.0 1 [0] 1
1 30.278961 81.597985 0.000036 3.0 1 [0] 1
2 38.730859 75.673462 0.732475 3.0 1 [0] 1
3 51.851538 75.30383 13.193803 3.0 1 [0] 1
4 65.971508 81.94329 11.063129 3.0 1 [0] 1
.. ... ... ... ... .. ... ...
165 918.513202 57.765912 0.380853 -4.0 2 [1.0, 1.0] 3
what am I doing wrong?