I'm looking to include multiple columns in my lambda function, but am running into key issues which shouldn't be right. I am looking for this line to create a new column that says IF "Decision" is present within the Task, then flag it as a Decision. Otherwise, IF "Milestone" is present in "Projects", mark it as a Milestone. Otherwise, leave it as the current Task Type.
today['New_Type'] = today[['Task','Projects','Type'].apply(lambda x,y,z: "Decision" if "Decision" in x else "Milestone" if "Milestone" in y else z)
Any ideas how to adjust this?
CodePudding user response:
This is easier to debug if you use a regular, named function. Be sure to specify the axis
argument when you call apply
. The function you write will need to take a single argument that is a tuple of the three column values, so best unpack them immediately for readability:
import pandas as pd
def task_type(row):
task, project, old_type = row
if 'decision' in task.lower():
return 'Decision'
if 'milestone' in project.lower():
return 'Milestone'
return old_type
today = pd.DataFrame({'Task': ['Make a decision.',
'Do something else.',
'Write a function.'],
'Projects': ['alpha', 'Milestone 7',
'gamma'],
'Type': ['old 1', 'old 2', 'old 3']})
today['New_Type'] = today.apply(task_type, axis=1)
today
Task Projects Type New_Type
0 Make a decision. alpha old 1 Decision
1 Do something else. Milestone 7 old 2 Milestone
2 Write a function. gamma old 3 old 3
CodePudding user response:
Avoid Series.apply
(hidden loop) and consider a vectorized, conditional logic approach using numpy.where
or numpy.select
:
today['New_Type'] = np.where(
today['Task'].str.contains('Decision', regex = False),
'Decision',
np.where(
today['Task'].str.contains('Milestone', regex = False),
'Milestone',
today['Task']
)
)
today['New_Type'] = np.select(
condlist = [
today['Task'].str.contains('Decision', regex = False),
today['Task'].str.contains('Milestone', regex = False)
],
choicelist = ['Decision', 'Milestone'],
default = today['Task']
)