Home > Enterprise >  Pandas:How to apply a complex function to a column of a dataframe, with two other columns as the inp
Pandas:How to apply a complex function to a column of a dataframe, with two other columns as the inp

Time:05-02

I have a dataframe that looks like this:

 -------- ------------- ---------- --------- 
| Worker | Schedule    | Overtime | Product |
 -------- ------------- ---------- --------- 
| 1      | some string | some int | ABC     |
 -------- ------------- ---------- --------- 
| 2      | some string | some int | DEF     |
 -------- ------------- ---------- --------- 
| 3      | some string | some int | GHI     |
 -------- ------------- ---------- --------- 

I have wrote a complex function that takes "Schedule", "Overtime", and "Product" as input, and return a newly edited "Schedule".

def edit_schedule(Schedule, Overtime, Product):
    *some calculation ...* 
   return Schedule_edited 

I tested this function with just 1 row of data and it works.

Schedule = some string
Overtime = some int
Product = 'ABC'

print(edit_schedule(Schedule, Overtime, Product)) 

Now, how do I apply this function to the entire dataframe, so that I would have a new column called "Schedule_Edited" that return the newly edited Schedule that is a result of the function being applied to each row of the data?

 -------- ------------- ---------- --------- ----------------- 
| Worker | Schedule    | Overtime | Product | Schedule_Edited |
 -------- ------------- ---------- --------- ----------------- 
| 1      | some string | some int | ABC     | some string     |
 -------- ------------- ---------- --------- ----------------- 
| 2      | some string | some int | DEF     | some string     |
 -------- ------------- ---------- --------- ----------------- 
| 3      | some string | some int | GHI     | some string     |
 -------- ------------- ---------- --------- ----------------- 

The actual dataframe has millions of rows, so any method that could make the calculation faster is really appreciated.

Much appreciation for your help!

CodePudding user response:

You can try apply on rows

def edit_schedule(row):
    Schedule = row['Schedule']
    Overtime = row['Overtime']
    Product = row['Product']
    *some calculation ...*
    return Schedule_edited


df['Schedule_Edited'] = df.apply(edit_schedule, axis=1)
  • Related