Home > Software engineering >  Custom function not working on pandas column
Custom function not working on pandas column

Time:04-02

I have a dataset with a column like this:


0        534K
1      482.2K
2      491.3K
3      247.6K
4      815.3K
        ...  
144    403.4K
145      1.3M
146      3.1M
147    311.5K
148      3.7M

I'm trying to clean the column values by writing a function like this (to remove the 'K's and '.' as a test first):


def fix_column(val):
    i = 0
    if val[i][-1] == 'K' and len(val[i]) == 4:
        val.replace("K","")
        val = val   "000"
    elif val[i][-1] == 'K'and len(val[i]) == 5:
        val.replace("K","").replace(".","")
        val = val   "00"
    i  = 1
    
    return val

But when I try to apply the function on the column like df.col = df.col.apply(fix_column) nothing happens and the column values remain unchanged. If I do something like df.col = df.col.apply(lambda x:x.replace("K","").replace(".","")) I do get close to a desirable output like so:

0       534
1      4822
2      4913
3      2476
4      8153
       ... 
144    4034
145     13M
146     31M
147    3115
148     37M

I may be missing something basic but would appreciate any pointers in the right direction. Thanks.

CodePudding user response:

Your first mistake is to use a loop at all. Your second mistake is to use a loop incorrectly. (i is a local variable, its value is always 0.) Here's a way to go:

Separate the number and the decimal suffix (if any):

number_strings = df.col.str.extract('([\d\.] )([MK]?)')

Convert the number strings into proper numbers:

numbers = number_strings[0].astype(float)

Multiply the numbers, where necessary:

numbers[number_strings[1] == 'K'] *= 1,000
numbers[number_strings[1] == 'M'] *= 1,000,000

Paste the results back into the dataframe:

df.col = numbers
  • Related