I have a function isValidString that's used to validate the format of a provided string and which returns true or false based on a set of criteria.
I have an existing dataframe to which I need to add a new column which should be populated by using isValidString on the value of another column. I've been trying to accomplish this with the apply method but I can't get it working. I'm currently trying it as shown below (I don't understand lambdas super well) and my isValidString function throws an error essentially complaining that the expected string is a float, and I have no idea what it's receiving that would make that the case.
df_test['is_valid'] = df_test['testresults'].apply(lambda x: isValidString(x))
This seems like it should be a pretty straightforward operation as it must be common but I haven't been able to find a solution on SO or elsewhere.
CodePudding user response:
You only have to pass the name of the user-defined function (without the parentheses) for apply to utilize it
df_test['is_valid'] = df_test['testresults'].apply(isValidString)
CodePudding user response:
The issue has to be something with the data in your 'testresults' column. The way you are using apply looks just fine - whether you pass just the function or the function wrapped in a lambda.
import pandas as pd
import numpy as np
# return True if string is an even number after the first character
def isValidString(s):
return int(s[1:])%2 == 0
# example that the function works
assert [ isValidString(s) for s in ["v1", "v64", "v33", "v10"] ]==[False, True, False, True]
# create a test dataframe
d = pd.DataFrame( {'v':['v' str(n) for n in np.arange(100)]} )
# both the raw function passed to apply() and lambda version work
assert (d['v'].apply(lambda s: isValidString(s))==d['v'].apply(isValidString)).all()