Dataframe stu_alcol looks like following:
school sex age address famsize Pstatus Medu Fedu Mjob Fjob reason guardian
0 GP F 18 U GT3 A 4 4 at_home teacher course mother
1 GP F 17 U GT3 T 1 1 at_home other course father
2 GP F 15 U LE3 T 1 1 at_home other other mother
3 GP F 15 U GT3 T 4 2 health services home mother
4 GP F 16 U GT3 T 3 3 other other home father
Goal is to multiply all integer values with 10 (playing with data)
This code however throws 'invalid syntax' error
stu_alcol.transform(lambda x: x*10 if isinstance(x, int))
Can anyone help? Please understand that I am aware of other possible solutions. I just want to understand what can be possibly wrong here.
CodePudding user response:
You can update the entire df to numeric, and let 'coerce' conver the non-numerics to NaN. Multiply that by 10 and update the original df.
This should allow you to handle mixed-type columns properly as well.
df.update(df.apply(pd.to_numeric, errors='coerce').mul(10))
CodePudding user response:
You can select the columns by name and multiply them by a value.
stu_alcol[['age', 'Medu', 'Fedu']] *= 10
# stu_alcol[['age', 'Medu', 'Fedu']] = stu_alcol[['age', 'Medu', 'Fedu']]*10
# stu_alcol[['age', 'Medu', 'Fedu']] = stu_alcol[['age', 'Medu', 'Fedu']].multiply(10)
All three examples give the same result but using different notations.
Comment
You can perform a apply()
function to all rows like below:
stu_alcol = stu_alcol.apply(lambda x: [xx*10 if isinstance(xx,int) else xx for xx in x])
but this is not easy to read and can have some performance problems.
CodePudding user response:
The reason this isn't working is that a lambda function can only have one expression. Your if
makes the lambda function more than one expression, hence the 'invalid syntax' error.
You would have to make the lambda function a single expression, for example by making it its own function, to correct the error (also note that the type that you probably want to be checking for is numpy.int64
not int
).
As an example, the following will work (although the mult_ints_by_10
function is just some example code to make the point, and certainly isn't optimised!)
def mult_ints_by_10(data_series):
return_series = data_series.copy()
for loop in range(len(data_series)):
element = data_series[loop]
return_series[loop] = element * 10 if isinstance(element, numpy.int64) else element
return return_series
stu_alcol.transform(lambda x: mult_ints_by_10(x))