I have a dataset that consists of ID (participant), run, indexnumber (that is, an index number of a slalom turn) and performance (that could be velocity or time). In addition, I have information for each id and run where in the slalom turn (that is, the index) they actually start to turn.
My goal is to create a new column in the dataframe that contain 0 if the id has not started to turn and 1 if they have started to turn. This column could be called phase.
For example: For ID1 the point where this skier starts to turn i index 4 for the first run and 9 for the second run. Therefore, I want all rows in the new column to contain 0s until index nr 4 and 1s thereafter (for the first run). For the second run I want all rows to contain 0s until index nr 9 and 1 thereafter.
Is there a simple way to do this with pandas or vanilla python?
example = [[1.0, 1.0, 1.0, 0.6912982024915187],
[1.0, 1.0, 2.0, 0.16453900411106737],
[1.0, 1.0, 3.0, 0.11362801727310845],
[1.0, 1.0, 4.0, 0.587778444335624],
[1.0, 1.0, 5.0, 0.8455388913351765],
[1.0, 1.0, 6.0, 0.5719366584505648],
[1.0, 1.0, 7.0, 0.4665520044952449],
[1.0, 1.0, 8.0, 0.9105152709573275],
[1.0, 1.0, 9.0, 0.4600099001744885],
[1.0, 1.0, 10.0, 0.8577060884077763],
[1.0, 2.0, 1.0, 0.11550722410813963],
[1.0, 2.0, 2.0, 0.5729090378222077],
[1.0, 2.0, 3.0, 0.43990164344919824],
[1.0, 2.0, 4.0, 0.595242293948498],
[1.0, 2.0, 5.0, 0.443684017624451],
[1.0, 2.0, 6.0, 0.3608135854303052],
[1.0, 2.0, 7.0, 0.28525404982906766],
[1.0, 2.0, 8.0, 0.11561422303194391],
[1.0, 2.0, 9.0, 0.8579134051748011],
[1.0, 2.0, 10.0, 0.540598113345226],
[2.0, 1.0, 1.0, 0.4058570295736075],
[2.0, 1.0, 2.0, 0.9422426000325298],
[2.0, 1.0, 3.0, 0.7918655742964762],
[2.0, 1.0, 4.0, 0.4145753321336241],
[2.0, 1.0, 5.0, 0.5256388261997529],
[2.0, 1.0, 6.0, 0.8140335187050629],
[2.0, 1.0, 7.0, 0.12134416740848841],
[2.0, 1.0, 8.0, 0.9016748379372173],
[2.0, 1.0, 9.0, 0.462241316800442],
[2.0, 1.0, 10.0, 0.7839715857746699],
[2.0, 2.0, 1.0, 0.5300527244824904],
[2.0, 2.0, 2.0, 0.8784844676567194],
[2.0, 2.0, 3.0, 0.14395673182343738],
[2.0, 2.0, 4.0, 0.7606405990262495],
[2.0, 2.0, 5.0, 0.5123048342846208],
[2.0, 2.0, 6.0, 0.25608277502943655],
[2.0, 2.0, 7.0, 0.4264542956426933],
[2.0, 2.0, 8.0, 0.9144976708651866],
[2.0, 2.0, 9.0, 0.875888479621729],
[2.0, 2.0, 10.0, 0.3428732760552141]]
turnPhaseId1 = [4,9] #the index number when ID1 starts to turn in run 1 and run 2, respectively
turnPhaseId2 = [2,5] #the index number when ID2 starts to turn in run 1 and run 2, respectively
pd.DataFrame(example, columns=['id', 'run', 'index', 'performance'])
CodePudding user response:
I believe it is a better idea to turnPhase in a dictionary, and then use apply
:
turn_dict = {1: [4, 9],
2: [2, 5]}
We also need to change the column types as we need to reach dictionary keys, and list indexes, which are int
:
df['id'] = df['id'].astype(int)
df['index'] = df['index'].astype(int)
Finally, apply
:
df['new_column'] = df.apply(lambda x: 0 if x['index'] < turn_dict[x['id']][int(x['run'] -1)] else 1 , axis=1)