Home > Mobile >  Iterate over rows of a dataframe based on index in python
Iterate over rows of a dataframe based on index in python

Time:04-06

I am trying to build a loop that iterate over each rows of several Dataframes in order to create two new columns. The original dataframes contain two columns (time, velocity), which can vary in length and stored in nested dictionaries. Here an exemple of one of them :

 time  velocity
0   0.000000  0.136731
1   0.020373  0.244889
2   0.040598  0.386443
3   0.060668  0.571861
4   0.080850  0.777680
5   0.101137  1.007287
6   0.121206  1.207533
7   0.141284  1.402833
8   0.161388  1.595385
9   0.181562  1.762003
10  0.201640  1.857233
11  0.221788  2.006104
12  0.241866  2.172649

The two new columns should de a normalization of the 'time' and 'velocity' column, respectively. Each rows of the new columns should therefore be equal to the following transformation :

t_norm = (time(n) - time(n-1)) / (time(max) - time(min))

vel_norm = (velocity(n) - velocity(n-1)) / (velocity(max) - velocity(min))

Also, the first value of the two new column should be set to 0.

My problem is that I don't know how to properly indicate to python how to access to n and n-1 values to realize such operations, and I don't know if that could be done using pd.DataFrame.iterrows() or the .iloc function.

I have come with the following piece of code, but it miss the crucial parts :

for nested_dict in dict_all_raw.values():
    for dflist in nested_dict.values():
        dflist['t_norm'] = ? / (dflist['time'].max() - dflist['time'].min())
        dflist['vel_norm'] = ? / (dflist['velocity'].max() - dflist['velocity'].min())
        dflist['acc_norm'] = dflist['vel_norm'] / dflist['t_norm']

Any help is welcome..! :)

CodePudding user response:

You can use shift (see the doc here) to create lagged columns

df['time_n-1']=df['time'].shift(1)

Also, the first value of the two new column should be set to 0.

Use df['column']=df['column'].fillna(0) after your calculations

CodePudding user response:

If you just want to normalise, you can write the expression directly, using Series.min and Series.max:

m = df['time'].min()
df['normtime'] = (df['time'] - m) / (df['time'].max() - m)

However, if you want the difference between successive elements, you can use Series.diff:

df['difftime'] = df['time'].diff() / (df['time'].max() - df['time'].min())

Testing:

df = pd.DataFrame({'time': [0.000000, 0.020373, 0.040598], 'velocity': [0.136731, 0.244889, 0.386443]})
print(df)
#        time  velocity
# 0  0.000000  0.136731
# 1  0.020373  0.244889
# 2  0.040598  0.386443

m = df['time'].min()
df['normtime'] = (df['time'] - m) / (df['time'].max() - m)

df['difftime'] = df['time'].diff() / (df['time'].max() - df['time'].min())

print(df)
#        time  velocity  normtime  difftime
# 0  0.000000  0.136731  0.000000       NaN
# 1  0.020373  0.244889  0.501823  0.501823
# 2  0.040598  0.386443  1.000000  0.498177
  • Related