Converting data frame column values to decimal values-CodePudding

My dataset has two columns, 'A' and 'B,' both of which have percentage values but are of the object datatype. For example,

A%	B%
1.x%	3.x%
2.x%	4.x%

Goal: I'm mostly interested in using this for machine learning clustering, hence my goal is to convert it to decimal form. For example, convert the '1.2%' object value to a float value of 0.012.

I tried two methods: the first was successful, but it took a long time.

I removed or stripped the object % from say '34%' using pandas.Series.str.strip to '34' obj and then converted this value to float using .to_numeric() --> 34. Now I divided this value with 100 and got the result 0.34.
However, in the second way I was attempting the below,

The function:

def Tab_to_float(z):
    return float(z.strip('%'))/100

Now when I pass the column (which is an object) as below:

Tab_to_float(df['A'])

I get error:

AttributeError: 'Series' object has no attribute 'strip'

I tried feeding this function an int, float, numpy array, and even a dataframe, but I got the same error: 'that' object has no attribute'strip'. I'm not sure where I'm going wrong. Is there a better way to deal with such requirements? Any help is much appreciated!

CodePudding user response：

To make it a bit interesting, here is a snippet to convert all columns ending in '%' from text percentage format to float:

for col in df.filter(regex='.*%'):   # if column name ends in '%'
    df[col] = df[col].str.rstrip('%').astype(float).div(100) # remove %, convert to float, divide by 100
    df.rename(columns={col: col.rstrip('%')}, inplace=True)  # remove the '%' in the column name

output:

       A      B
0  0.011  0.033
1  0.022  0.044

CodePudding user response：

df['A'] = df.apply(lambda row : Tab_to_float(row['A']), axis = 1)

You can do this for these two columns and then you can apply this function.

We are applying a function along an axis of the DataFrame. (Here we are changing each element of a column). We are not changing anything for the Tab_to_float function in this solution.

data = {
        'A':['34.3%', '24%'],
        'B':['32%','33%'] }
 
df = pd.DataFrame(data)

 
df['A'] = df.apply(lambda row : Tab_to_float(row['A']), axis = 1)
df['B'] = df.apply(lambda row : Tab_to_float(row['B']), axis = 1)

print(df)

Outputs:

       A     B
0  0.343  0.32
1  0.240  0.33

CodePudding user response：

You can use lambda operator to apply your functions to the pandas data frame or to the series. you ca convert each element on a column to a floating point number and divide by 100, like this:

(df['A']).apply(lambda x: float(x.strip('%'))/100)