Home > Software engineering >  Why can't I convert dataframe column to float
Why can't I convert dataframe column to float

Time:06-29

I am trying to convert dataframe column to a list of floats without success.

My original code is as below and this is a Link to the excel workbook I am working on.

df = pd.read_csv('C:/Users/radoy/Programming/ML Chart Patterns/xx_combined.csv')
df['opens'].astype(float)
df.dtypes
periods           int64
opens            object
highs            object
lows             object
closes           object
volumes           int64
consolidating    object
dtype: object

I keep getting an error:

ValueError: could not convert string to float: '[[63.240001

I have gone through the suggestions on another post here but none of them seem to work either.

Converting strings to floats in a DataFrame

What am I doing wrong? Any guidance will be appreciated.

CodePudding user response:

You can try the following code.

Code:

import pandas as pd

df = pd.read_csv('xx_combined.csv')
df.opens.apply(lambda s: [float(x.strip(' []')) for x in s.split(',')])
print(df)

CodePudding user response:

This should select numbers (with dots) from the strings in your data.

import re

pattern = r"[\d.] "
df['opens'] = df['opens'].apply(lambda x: re.search(pattern, x)[0]).astype(float)

CodePudding user response:

Try this:


import numpy as np


df['opens'] = (
    df['opens']
    .str.strip('[')
    .str.strip(']')
    .str.strip(' ')
    .str.replace(', ', ',')
    .str.split(',')
    .apply(np.array, dtype=float)
)

The error you're getting (ValueError: could not convert string to float: '[[63.240001) is due to the fact that the values from the column opens are being read as strings, instead of lists of values.

For example, the first value of opens being read is:

"[63.2400016784668, 62.20000076293945, 61.91999816894531, 61.40000152587891, 60.65999984741211, 60.04000091552734, 61.27999877929688, 60.0, 59.11999893188477, 57.88000106811523, 57.7599983215332, 59.04000091552734, 58.18000030517578, 55.29999923706055, 54.13999938964844, 54.52000045776367, 54.13999938964844, 56.72000122070312, 57.0, 58.29999923706055, 58.34000015258789, 58.04000091552734, 58.5, 58.45999908447266, 58.34000015258789, 56.09999847412109, 56.72000122070312, 58.5, 59.13999938964844, 58.41999816894531, 58.65999984741211, 57.90000152587891, 56.43999862670898, 55.7599983215332, 56.27999877929688, 55.22000122070312, 56.5, 56.58000183105469, 56.72000122070312, 56.38000106811523, 55.72000122070312, 55.88000106811523, 56.7400016784668, 58.06000137329102, 58.79999923706055, 59.40000152587891, 59.56000137329102, 58.18000030517578, 58.11999893188477, 57.72000122070312, 57.79999923706055, 56.88000106811523, 57.31999969482422, 56.11999893188477, 56.59999847412109, 56.38000106811523, 57.15999984741211, 56.08000183105469, 56.93999862670898, 57.86000061035156, 57.88000106811523, 58.54000091552734, 58.70000076293945, 57.81999969482422, 58.68000030517578, 58.58000183105469]"

instead of something like:

['63.2400016784668',
 '62.20000076293945',
 '61.91999816894531',
 '61.40000152587891',
 '60.65999984741211',
 '60.04000091552734',
 '61.27999877929688',
 '60.0',
 '59.11999893188477',
 '57.88000106811523',
 '57.7599983215332',
 '59.04000091552734',
 '58.18000030517578',
 '55.29999923706055',
 '54.13999938964844',
 '54.52000045776367',
 '54.13999938964844',
 '56.72000122070312',
 '57.0',
 '58.29999923706055',
 '58.34000015258789',
 '58.04000091552734',
 '58.5',
 '58.45999908447266',
 '58.34000015258789',
 '56.09999847412109',
 '56.72000122070312',
 '58.5',
 '59.13999938964844',
 '58.41999816894531',
 '58.65999984741211',
 '57.90000152587891',
 '56.43999862670898',
 '55.7599983215332',
 '56.27999877929688',
 '55.22000122070312',
 '56.5',
 '56.58000183105469',
 '56.72000122070312',
 '56.38000106811523',
 '55.72000122070312',
 '55.88000106811523',
 '56.7400016784668',
 '58.06000137329102',
 '58.79999923706055',
 '59.40000152587891',
 '59.56000137329102',
 '58.18000030517578',
 '58.11999893188477',
 '57.72000122070312',
 '57.79999923706055',
 '56.88000106811523',
 '57.31999969482422',
 '56.11999893188477',
 '56.59999847412109',
 '56.38000106811523',
 '57.15999984741211',
 '56.08000183105469',
 '56.93999862670898',
 '57.86000061035156',
 '57.88000106811523',
 '58.54000091552734',
 '58.70000076293945',
 '57.81999969482422',
 '58.68000030517578',
 '58.58000183105469']

CodePudding user response:

there are some some non digit characters like space etc. you can try to replace them all with null prior to converting to float

df['open'].replace(r'(\"|\[|\s)',"", regex=True).astype('float')
0     63.240002
1     12.600000
2    213.149994
3     11.280000
4     76.820000
5     15.600000
6    210.179993
Name: opens, dtype: float64

Test Data from provided Excel

periods opens
65  [63.2400016784668"
29  [12.60000038146973
19  [213.1499938964844
44  [11.27999973297119
78  [76.81999969482422
34  [15.60000038146973
74  [210.1799926757812 
  • Related