I'm trying out dfply as an alternative to Pandas apply and applymap. Given some fake data:
import pandas as pd
from dfply import *
df = pd.DataFrame({'country':['taiwan','ireland','taiwan', 'ireland', 'china'],
'num':[10.00, 10.50, 33.99, 10.50, 300],
'score':[1, 1, 3, 5, 10]})
country num score
0 taiwan 10.00 1
1 ireland 10.50 1
2 taiwan 33.99 3
3 ireland 10.50 5
4 china 300.00 10
IRL I often need to make custom mappings. Instead of .map I tried this:
@pipe
def update_country(country):
if country == 'taiwan':
return 'Republic of Taiwan'
else:
return country
df >> mutate(new_country=update_country(X.country)) >> select(X.new_country)
But I get this output:
new_country
0 <dfply.base.pipe object at 0x000001CAECD9B4F0>
1 <dfply.base.pipe object at 0x000001CAECD9B4F0>
2 <dfply.base.pipe object at 0x000001CAECD9B4F0>
3 <dfply.base.pipe object at 0x000001CAECD9B4F0>
4 <dfply.base.pipe object at 0x000001CAECD9B4F0>
Am I using the wrong decorator? Or can I do without a custom function?
CodePudding user response:
Here you are trying to pass the series (X.country
). Just use apply
function
You can achieve this without decoration.
#DATA
df = pd.DataFrame({'country':['taiwan','ireland','taiwan', 'ireland', 'china'],
'num':[10.00, 10.50, 33.99, 10.50, 300],
'score':[1, 1, 3, 5, 10]})
#UTILITY FUNCTION
def update_country(country):
if country == 'taiwan':
return 'Republic of Taiwan'
else:
return country
#PIPING
#MAKE A NOTE THAT APPLY FUNCTION IS CALLED ON SERIES
result = df >> mutate(new_country=X.country.apply(update_country)) >> select(X.new_country)
print(result)