I have a pandas dataframe and want to apply three different functions on one of the columns in parallel. For example:
df = pd.DataFrame(
{'col1': ['LA','Boston','Phoenix','Toronto'], 'col2': [2,3,4,5]},
columns=['col1', 'col2'])
def function1(x):
return(x**2)
def function2(x):
return(x**3)
def function(x):
return(x**4)
I want to apply the three functions to column 'col2' in parallel and the results to be new columns in the dataframe.
CodePudding user response:
Ivan, it doesn't clear what do you mean by "parallel".
SIMPLE ANSWER
the simplest code to do that is the following:
def function1(x):
return(x**2)
def function2(x):
return(x**3)
def function(x):
return(x**4)
df = pd.DataFrame(
{'col1': ['LA','Boston','Phoenix','Toronto'], 'col2': [2,3,4,5]},
columns=['col1', 'col2'])
df['func1_col2'] = df['col2'].apply(function1, axis=1) # axis=1: 1 or ‘columns’: apply function to each row...
df['func2_col2'] = df['col2'].apply(function2, axis=1)
df['func3_col2'] = df['col2'].apply(function3, axis=1)
As a result, you will get three more columns in your DataFrame. Here, details for apply function.
PARALLEL
If you want to make it in parallel — you need to use a multiprocessing tool. If so, please, update your question and you will get an answer on how to make it in parallel with Python (3.10 ).
CodePudding user response:
I would do something like this to make sure we iterate only once.
df = pandas.DataFrame(
{'col1': ['LA', 'Boston', 'Phoenix', 'Toronto'], 'col2': [2, 3, 4, 5]},
columns=['col1', 'col2'])
def function1(x):
return (x ** 2)
def function2(x):
return (x ** 3)
def function(x):
return (x ** 4)
def all_fn(row):
return function1(row["col2"]), function2(row["col2"]), function(row["col2"]),
df[["col3", "col4", "col5"]] = df.apply(all_fn, axis=1, result_type="expand")