Home > Back-end >  I'm getting memory address instead of values using dfply mutate custom function
I'm getting memory address instead of values using dfply mutate custom function

Time:08-10

I'm trying out dfply as an alternative to Pandas apply and applymap. Given some fake data:

import pandas as pd
from dfply import *
df = pd.DataFrame({'country':['taiwan','ireland','taiwan', 'ireland', 'china'],
                   'num':[10.00, 10.50, 33.99, 10.50, 300],
                   'score':[1, 1, 3, 5, 10]})

   country     num  score
0   taiwan   10.00      1
1  ireland   10.50      1
2   taiwan   33.99      3
3  ireland   10.50      5
4    china  300.00     10

IRL I often need to make custom mappings. Instead of .map I tried this:

@pipe
def update_country(country):
    if country == 'taiwan':
        return 'Republic of Taiwan'
    else:
        return country

df >> mutate(new_country=update_country(X.country)) >> select(X.new_country)

But I get this output:

                                      new_country
0  <dfply.base.pipe object at 0x000001CAECD9B4F0>
1  <dfply.base.pipe object at 0x000001CAECD9B4F0>
2  <dfply.base.pipe object at 0x000001CAECD9B4F0>
3  <dfply.base.pipe object at 0x000001CAECD9B4F0>
4  <dfply.base.pipe object at 0x000001CAECD9B4F0>

Am I using the wrong decorator? Or can I do without a custom function?

CodePudding user response:

Here you are trying to pass the series (X.country). Just use apply function

You can achieve this without decoration.

#DATA
df = pd.DataFrame({'country':['taiwan','ireland','taiwan', 'ireland', 'china'],
                   'num':[10.00, 10.50, 33.99, 10.50, 300],
                   'score':[1, 1, 3, 5, 10]})

#UTILITY FUNCTION
def update_country(country):
    if country == 'taiwan':
        return 'Republic of Taiwan'
    else:
        return country

#PIPING
#MAKE A NOTE THAT APPLY FUNCTION IS CALLED ON SERIES
result = df >> mutate(new_country=X.country.apply(update_country)) >> select(X.new_country)

print(result)
  • Related