How to use an index of a dataframe to assign values to a row of a new column?-CodePudding

I have a dataset that consists of ID (participant), run, indexnumber (that is, an index number of a slalom turn) and performance (that could be velocity or time). In addition, I have information for each id and run where in the slalom turn (that is, the index) they actually start to turn.

My goal is to create a new column in the dataframe that contain 0 if the id has not started to turn and 1 if they have started to turn. This column could be called phase.

For example: For ID1 the point where this skier starts to turn i index 4 for the first run and 9 for the second run. Therefore, I want all rows in the new column to contain 0s until index nr 4 and 1s thereafter (for the first run). For the second run I want all rows to contain 0s until index nr 9 and 1 thereafter.

Is there a simple way to do this with pandas or vanilla python?

example = [[1.0, 1.0, 1.0, 0.6912982024915187],
 [1.0, 1.0, 2.0, 0.16453900411106737],
 [1.0, 1.0, 3.0, 0.11362801727310845],
 [1.0, 1.0, 4.0, 0.587778444335624],
 [1.0, 1.0, 5.0, 0.8455388913351765],
 [1.0, 1.0, 6.0, 0.5719366584505648],
 [1.0, 1.0, 7.0, 0.4665520044952449],
 [1.0, 1.0, 8.0, 0.9105152709573275],
 [1.0, 1.0, 9.0, 0.4600099001744885],
 [1.0, 1.0, 10.0, 0.8577060884077763],
 [1.0, 2.0, 1.0, 0.11550722410813963],
 [1.0, 2.0, 2.0, 0.5729090378222077],
 [1.0, 2.0, 3.0, 0.43990164344919824],
 [1.0, 2.0, 4.0, 0.595242293948498],
 [1.0, 2.0, 5.0, 0.443684017624451],
 [1.0, 2.0, 6.0, 0.3608135854303052],
 [1.0, 2.0, 7.0, 0.28525404982906766],
 [1.0, 2.0, 8.0, 0.11561422303194391],
 [1.0, 2.0, 9.0, 0.8579134051748011],
 [1.0, 2.0, 10.0, 0.540598113345226],
 [2.0, 1.0, 1.0, 0.4058570295736075],
 [2.0, 1.0, 2.0, 0.9422426000325298],
 [2.0, 1.0, 3.0, 0.7918655742964762],
 [2.0, 1.0, 4.0, 0.4145753321336241],
 [2.0, 1.0, 5.0, 0.5256388261997529],
 [2.0, 1.0, 6.0, 0.8140335187050629],
 [2.0, 1.0, 7.0, 0.12134416740848841],
 [2.0, 1.0, 8.0, 0.9016748379372173],
 [2.0, 1.0, 9.0, 0.462241316800442],
 [2.0, 1.0, 10.0, 0.7839715857746699],
 [2.0, 2.0, 1.0, 0.5300527244824904],
 [2.0, 2.0, 2.0, 0.8784844676567194],
 [2.0, 2.0, 3.0, 0.14395673182343738],
 [2.0, 2.0, 4.0, 0.7606405990262495],
 [2.0, 2.0, 5.0, 0.5123048342846208],
 [2.0, 2.0, 6.0, 0.25608277502943655],
 [2.0, 2.0, 7.0, 0.4264542956426933],
 [2.0, 2.0, 8.0, 0.9144976708651866],
 [2.0, 2.0, 9.0, 0.875888479621729],
 [2.0, 2.0, 10.0, 0.3428732760552141]]

turnPhaseId1 = [4,9]  #the index number when ID1 starts to turn in run 1 and run 2, respectively
turnPhaseId2 = [2,5] #the index number when ID2 starts to turn in run 1 and run 2, respectively

pd.DataFrame(example, columns=['id', 'run', 'index', 'performance'])

CodePudding user response：

I believe it is a better idea to turnPhase in a dictionary, and then use apply:

turn_dict = {1: [4, 9],
        2: [2, 5]}

We also need to change the column types as we need to reach dictionary keys, and list indexes, which are int:

df['id'] = df['id'].astype(int)
df['index'] = df['index'].astype(int)

Finally, apply:

df['new_column'] = df.apply(lambda x: 0 if x['index'] < turn_dict[x['id']][int(x['run'] -1)]  else 1 , axis=1)