How can I populate a new Pandas dataframe by using a function to operate on the values in another ex-CodePudding

I have a function isValidString that's used to validate the format of a provided string and which returns true or false based on a set of criteria.

I have an existing dataframe to which I need to add a new column which should be populated by using isValidString on the value of another column. I've been trying to accomplish this with the apply method but I can't get it working. I'm currently trying it as shown below (I don't understand lambdas super well) and my isValidString function throws an error essentially complaining that the expected string is a float, and I have no idea what it's receiving that would make that the case.

df_test['is_valid'] = df_test['testresults'].apply(lambda x: isValidString(x))

This seems like it should be a pretty straightforward operation as it must be common but I haven't been able to find a solution on SO or elsewhere.

CodePudding user response：

You only have to pass the name of the user-defined function (without the parentheses) for apply to utilize it

df_test['is_valid'] = df_test['testresults'].apply(isValidString)

CodePudding user response：

The issue has to be something with the data in your 'testresults' column. The way you are using apply looks just fine - whether you pass just the function or the function wrapped in a lambda.

import pandas as pd
import numpy as np

# return True if string is an even number after the first character
def isValidString(s):
    return int(s[1:])%2 == 0

# example that the function works
assert [ isValidString(s) for s in ["v1", "v64", "v33", "v10"] ]==[False, True, False, True]

# create a test dataframe
d = pd.DataFrame( {'v':['v' str(n) for n in np.arange(100)]} )

# both the raw function passed to apply() and lambda version work
assert (d['v'].apply(lambda s: isValidString(s))==d['v'].apply(isValidString)).all()