How to apply multiple functions to a dataframe column parallel in python-CodePudding

I have a pandas dataframe and want to apply three different functions on one of the columns in parallel. For example:

df = pd.DataFrame(
{'col1': ['LA','Boston','Phoenix','Toronto'], 'col2': [2,3,4,5]},
columns=['col1', 'col2'])

def function1(x):
    return(x**2)

def function2(x):
    return(x**3)

def function(x):
    return(x**4)

I want to apply the three functions to column 'col2' in parallel and the results to be new columns in the dataframe.

CodePudding user response：

Ivan, it doesn't clear what do you mean by "parallel".

SIMPLE ANSWER

the simplest code to do that is the following:

def function1(x):
    return(x**2)

def function2(x):
    return(x**3)

def function(x):
    return(x**4)

df = pd.DataFrame(
{'col1': ['LA','Boston','Phoenix','Toronto'], 'col2': [2,3,4,5]},
columns=['col1', 'col2'])

df['func1_col2'] = df['col2'].apply(function1, axis=1) # axis=1: 1 or ‘columns’: apply function to each row...
df['func2_col2'] = df['col2'].apply(function2, axis=1)
df['func3_col2'] = df['col2'].apply(function3, axis=1)

As a result, you will get three more columns in your DataFrame. Here, details for apply function.

PARALLEL

If you want to make it in parallel — you need to use a multiprocessing tool. If so, please, update your question and you will get an answer on how to make it in parallel with Python (3.10 ).

CodePudding user response：

I would do something like this to make sure we iterate only once.

df = pandas.DataFrame(
    {'col1': ['LA', 'Boston', 'Phoenix', 'Toronto'], 'col2': [2, 3, 4, 5]},
    columns=['col1', 'col2'])


def function1(x):
    return (x ** 2)

def function2(x):
    return (x ** 3)


def function(x):
    return (x ** 4)

def all_fn(row):
    return function1(row["col2"]), function2(row["col2"]), function(row["col2"]),

df[["col3", "col4", "col5"]] = df.apply(all_fn, axis=1, result_type="expand")