Pandas:How to apply a complex function to a column of a dataframe, with two other columns as the inp-CodePudding

I have a dataframe that looks like this:

 -------- ------------- ---------- --------- 
| Worker | Schedule    | Overtime | Product |
 -------- ------------- ---------- --------- 
| 1      | some string | some int | ABC     |
 -------- ------------- ---------- --------- 
| 2      | some string | some int | DEF     |
 -------- ------------- ---------- --------- 
| 3      | some string | some int | GHI     |
 -------- ------------- ---------- ---------

I have wrote a complex function that takes "Schedule", "Overtime", and "Product" as input, and return a newly edited "Schedule".

def edit_schedule(Schedule, Overtime, Product):
    *some calculation ...* 
   return Schedule_edited

I tested this function with just 1 row of data and it works.

Schedule = some string
Overtime = some int
Product = 'ABC'

print(edit_schedule(Schedule, Overtime, Product))

Now, how do I apply this function to the entire dataframe, so that I would have a new column called "Schedule_Edited" that return the newly edited Schedule that is a result of the function being applied to each row of the data?

 -------- ------------- ---------- --------- ----------------- 
| Worker | Schedule    | Overtime | Product | Schedule_Edited |
 -------- ------------- ---------- --------- ----------------- 
| 1      | some string | some int | ABC     | some string     |
 -------- ------------- ---------- --------- ----------------- 
| 2      | some string | some int | DEF     | some string     |
 -------- ------------- ---------- --------- ----------------- 
| 3      | some string | some int | GHI     | some string     |
 -------- ------------- ---------- --------- -----------------

The actual dataframe has millions of rows, so any method that could make the calculation faster is really appreciated.

Much appreciation for your help!

CodePudding user response：

You can try apply on rows

def edit_schedule(row):
    Schedule = row['Schedule']
    Overtime = row['Overtime']
    Product = row['Product']
    *some calculation ...*
    return Schedule_edited


df['Schedule_Edited'] = df.apply(edit_schedule, axis=1)