Home > Mobile >  pandas Dataframe create new column
pandas Dataframe create new column

Time:12-12

I have this snippet of the code working with pandas dataframe, i am trying to use the apply function to create a new column called STDEV_TV but i keep running into this error all the columns i am working with are type float

TypeError: ("'float' object is not iterable", 'occurred at index 0')

Can someone help me understand why i keep getting this error

def sigma(df):
    val = df.volume2Sum / df.volumeSum - df.vwap * df.vwap
    return math.sqrt(max(val))


df['STDEV_TV'] = df.apply(sigma, axis=1)

CodePudding user response:

Try:

import pandas as pd
import numpy as np
import math

df = pd.DataFrame(np.random.randint(1, 10, (5, 3)),
                  columns=['volume2Sum', 'volumeSum', 'vwap'])

def sigma(df):
    val = df.volume2Sum / df.volumeSum - df.vwap * df.vwap
    return math.sqrt(val) if val >= 0 else val

df['STDEV_TV'] = df.apply(sigma, axis=1)

Output:

>>> df
   volume2Sum  volumeSum  vwap   STDEV_TV
0           4          5     8 -63.200000
1           2          8     4 -15.750000
2           3          3     3  -8.000000
3           8          3     4 -13.333333
4           4          2     3  -7.000000

CodePudding user response:

You function sigma gives you one number as a result. Because, the first step you find the maximum:

max(val)

and it's only the one number... After that you try uses you function for data series. You should use in your code this last string:

df['STDEV_TV'] = sigma(df)

It will be working

CodePudding user response:

Change

return math.sqrt(max(val))

to

return math.sqrt(max(val)) if isinstance(val, pd.Series) else (math.sqrt(val) if val >= 0 else val)

max() iterates over an iterable and find the maximum value. The problem here is since you're applying sigma to every row, local variable val is a float, not a list, so what you have similar to max(1.3).

CodePudding user response:

You need to apply sigma to each set of values not the whole DataFrame. I would use a lambda function, eg:

def sigma(volume2Sum, volumeSum, vwap):
    val = volume2Sum / volumeSum - vwap * vwap
    return math.sqrt(val)


df['STDEV_TV'] = df.apply(lambda x: sigma(x.volume2Sum, x.volumeSum, x.vwap), axis=1)

That should put val into the STDEV_TV column and you can find the max value separately. Take care you not to take the squareroot of a negative number.

  • Related