I am trying to convert dataframe column to a list of floats without success.
My original code is as below and this is a Link to the excel workbook I am working on.
df = pd.read_csv('C:/Users/radoy/Programming/ML Chart Patterns/xx_combined.csv')
df['opens'].astype(float)
df.dtypes
periods int64
opens object
highs object
lows object
closes object
volumes int64
consolidating object
dtype: object
I keep getting an error:
ValueError: could not convert string to float: '[[63.240001
I have gone through the suggestions on another post here but none of them seem to work either.
Converting strings to floats in a DataFrame
What am I doing wrong? Any guidance will be appreciated.
CodePudding user response:
You can try the following code.
Code:
import pandas as pd
df = pd.read_csv('xx_combined.csv')
df.opens.apply(lambda s: [float(x.strip(' []')) for x in s.split(',')])
print(df)
CodePudding user response:
This should select numbers (with dots) from the strings in your data.
import re
pattern = r"[\d.] "
df['opens'] = df['opens'].apply(lambda x: re.search(pattern, x)[0]).astype(float)
CodePudding user response:
Try this:
import numpy as np
df['opens'] = (
df['opens']
.str.strip('[')
.str.strip(']')
.str.strip(' ')
.str.replace(', ', ',')
.str.split(',')
.apply(np.array, dtype=float)
)
The error you're getting (ValueError: could not convert string to float: '[[63.240001
) is due to the fact that the values from the column opens
are being read as strings, instead of lists of values.
For example, the first value of opens
being read is:
"[63.2400016784668, 62.20000076293945, 61.91999816894531, 61.40000152587891, 60.65999984741211, 60.04000091552734, 61.27999877929688, 60.0, 59.11999893188477, 57.88000106811523, 57.7599983215332, 59.04000091552734, 58.18000030517578, 55.29999923706055, 54.13999938964844, 54.52000045776367, 54.13999938964844, 56.72000122070312, 57.0, 58.29999923706055, 58.34000015258789, 58.04000091552734, 58.5, 58.45999908447266, 58.34000015258789, 56.09999847412109, 56.72000122070312, 58.5, 59.13999938964844, 58.41999816894531, 58.65999984741211, 57.90000152587891, 56.43999862670898, 55.7599983215332, 56.27999877929688, 55.22000122070312, 56.5, 56.58000183105469, 56.72000122070312, 56.38000106811523, 55.72000122070312, 55.88000106811523, 56.7400016784668, 58.06000137329102, 58.79999923706055, 59.40000152587891, 59.56000137329102, 58.18000030517578, 58.11999893188477, 57.72000122070312, 57.79999923706055, 56.88000106811523, 57.31999969482422, 56.11999893188477, 56.59999847412109, 56.38000106811523, 57.15999984741211, 56.08000183105469, 56.93999862670898, 57.86000061035156, 57.88000106811523, 58.54000091552734, 58.70000076293945, 57.81999969482422, 58.68000030517578, 58.58000183105469]"
instead of something like:
['63.2400016784668',
'62.20000076293945',
'61.91999816894531',
'61.40000152587891',
'60.65999984741211',
'60.04000091552734',
'61.27999877929688',
'60.0',
'59.11999893188477',
'57.88000106811523',
'57.7599983215332',
'59.04000091552734',
'58.18000030517578',
'55.29999923706055',
'54.13999938964844',
'54.52000045776367',
'54.13999938964844',
'56.72000122070312',
'57.0',
'58.29999923706055',
'58.34000015258789',
'58.04000091552734',
'58.5',
'58.45999908447266',
'58.34000015258789',
'56.09999847412109',
'56.72000122070312',
'58.5',
'59.13999938964844',
'58.41999816894531',
'58.65999984741211',
'57.90000152587891',
'56.43999862670898',
'55.7599983215332',
'56.27999877929688',
'55.22000122070312',
'56.5',
'56.58000183105469',
'56.72000122070312',
'56.38000106811523',
'55.72000122070312',
'55.88000106811523',
'56.7400016784668',
'58.06000137329102',
'58.79999923706055',
'59.40000152587891',
'59.56000137329102',
'58.18000030517578',
'58.11999893188477',
'57.72000122070312',
'57.79999923706055',
'56.88000106811523',
'57.31999969482422',
'56.11999893188477',
'56.59999847412109',
'56.38000106811523',
'57.15999984741211',
'56.08000183105469',
'56.93999862670898',
'57.86000061035156',
'57.88000106811523',
'58.54000091552734',
'58.70000076293945',
'57.81999969482422',
'58.68000030517578',
'58.58000183105469']
CodePudding user response:
there are some some non digit characters like space etc. you can try to replace them all with null prior to converting to float
df['open'].replace(r'(\"|\[|\s)',"", regex=True).astype('float')
0 63.240002
1 12.600000
2 213.149994
3 11.280000
4 76.820000
5 15.600000
6 210.179993
Name: opens, dtype: float64
Test Data from provided Excel
periods opens
65 [63.2400016784668"
29 [12.60000038146973
19 [213.1499938964844
44 [11.27999973297119
78 [76.81999969482422
34 [15.60000038146973
74 [210.1799926757812