I have a dataframe that looks like this:
-------- ------------- ---------- ---------
| Worker | Schedule | Overtime | Product |
-------- ------------- ---------- ---------
| 1 | some string | some int | ABC |
-------- ------------- ---------- ---------
| 2 | some string | some int | DEF |
-------- ------------- ---------- ---------
| 3 | some string | some int | GHI |
-------- ------------- ---------- ---------
I have wrote a complex function that takes "Schedule", "Overtime", and "Product" as input, and return a newly edited "Schedule".
def edit_schedule(Schedule, Overtime, Product):
*some calculation ...*
return Schedule_edited
I tested this function with just 1 row of data and it works.
Schedule = some string
Overtime = some int
Product = 'ABC'
print(edit_schedule(Schedule, Overtime, Product))
Now, how do I apply this function to the entire dataframe, so that I would have a new column called "Schedule_Edited" that return the newly edited Schedule that is a result of the function being applied to each row of the data?
-------- ------------- ---------- --------- -----------------
| Worker | Schedule | Overtime | Product | Schedule_Edited |
-------- ------------- ---------- --------- -----------------
| 1 | some string | some int | ABC | some string |
-------- ------------- ---------- --------- -----------------
| 2 | some string | some int | DEF | some string |
-------- ------------- ---------- --------- -----------------
| 3 | some string | some int | GHI | some string |
-------- ------------- ---------- --------- -----------------
The actual dataframe has millions of rows, so any method that could make the calculation faster is really appreciated.
Much appreciation for your help!
CodePudding user response:
You can try apply
on rows
def edit_schedule(row):
Schedule = row['Schedule']
Overtime = row['Overtime']
Product = row['Product']
*some calculation ...*
return Schedule_edited
df['Schedule_Edited'] = df.apply(edit_schedule, axis=1)