How do I create a new column with values from one column or another based on the value in a third co-CodePudding

So, as an example, say I have the table:

Activity	Start Time	End Time
Red on	2:15	3:00
Red on	2:30	3:15
Red off	1:45	2:30
Red off	2:45	3:30

Based on the activity, I only care about one of two values - for 'Red on' I need to know the start time. For 'Red off' I need to know the end time.

I want to create a fourth column just labeled 'Change Time', and based on whether 'Activity' is 'Red on' or 'Red off' I want to grab either the Start Time column value or the End Time column value for this fourth column. Later, I'm going to be discarding the Start Time and End Time columns and just keeping this newly-merged Change Time column. For now, I'm just trying to work out how to create it.

With this example, the result I want is:

Activity	Start Time	End Time	Change Time
Red on	2:15	3:00	2:15
Red on	2:30	3:15	2:30
Red off	1:45	2:30	2:30
Red off	2:45	3:30	3:30

Let's assume Red on and Red off are the only two possible values for the Activity column. I thought I had an idea of how to do this, but both things I've tried have thrown errors.

First, I tried: df['Change time'] = df['Activity'].apply(lambda x: df['Start Time'] if x == 'Red on' else df['End Time'])

And I got an error that said "ValueError: Wrong number of items passed 35, placement implies 1" - since I have 35 rows, that leads me to believe df['Start Time'] was trying to pass the whole column. So, instead, I tried:

df['Change time'] = df['Activity'].apply(lambda x: df.loc['Start Time'] if x == 'Red on' else df.loc['End Time'])

And this one just gives me KeyError: 'Start Time'.

What am I missing to check the string value of the 'Activity' column and pass the value in the 'Start Time' column if 'Activity' == Red on and 'End time' if else?

CodePudding user response：

Assuming you only have "Red on"/"Red off" in Activity, this is a use case for numpy.where:

df['Change time'] = np.where(df['Activity'].eq('Red on'),
                             df['Start Time'], df['End Time'])

If you potentially have other values, use numpy.select:

df['Change time'] = np.select([df['Activity'].eq('Red on'), 
                               df['Activity'].eq('Red off')],
                              [df['Start Time'], df['End Time']], pd.NA)

or loc:

df.loc[df['Activity'].eq('Red on'), 'Change time'] = df['Start Time']
df.loc[df['Activity'].eq('Red off'), 'Change time'] = df['End Time']

output:

  Activity Start Time End Time Change time
0   Red on       2:15     3:00        2:15
1   Red on       2:30     3:15        2:30
2  Red off       1:45     2:30        2:30
3  Red off       2:45     3:30        3:30

CodePudding user response：

You can do this straightforwardly in pandas:

df['Change Time'] = df['Start Time'] 
df['Change Time'][df['Activity']=='Red off'] = df['End Time']