Home > Enterprise >  Missing data in Python nested loop
Missing data in Python nested loop

Time:04-08

I'm working with a multidimensional data array where I have various data points for individuals. I created a nested loop that allows me to make metric calculations throughout the entire dataset, however, once rearranging it I loose data points. From my initial 253 individuals, I end up with the calculated metrics for 182. The code works, but I don't know at which moment I'm letting data out.

data_array -- containing 253 individuals, each with several subcategories 

mos0_ids=[]
mos0_dt = []
mos0_x_dpos = []
mos0_y_dpos = []
mos0_z_dpos = []

for i in range (0,252): 
    mos0=data_array[i]
    mos0_id= mos0[0][0]                                                                             
    mos0_time=mos0[:,1]                                                                                      
    mos0_x_pos=mos0[:,2]
    mos0_y_pos=mos0[:,3]
    mos0_z_pos=mos0[:,4]
    mos0_speed=mos0[:,6]

    for j in range(0,len(mos0_id)):  
        mos0_ids.append(mos0_id)
        
    for k in range(0,len(mos0_time)):
        first_mov_time=mos0_time[k]
        last_mov_time=mos0_time[k-1]
        first_movement = dt.datetime.strptime(first_mov_time, '%Y-%m-%d %H:%M:%S.%f')
        last_movement = dt.datetime.strptime(last_mov_time, '%Y-%m-%d %H:%M:%S.%f')
        x = first_movement - last_movement
        total_seconds = x.total_seconds()  
        mos0_dt.append(total_seconds)
    
    for l in range(0,len(mos0_x_pos)):
        first_mov_pos=mos0_x_pos[l]
        last_mov_pos=mos0_x_pos[l-1]
        x = first_mov_pos - last_mov_pos
        mos0_x_dpos.append(x)
    
    for m in range(0,len(mos0_y_pos)):
        first_mov_pos=mos0_y_pos[m]
        last_mov_pos=mos0_y_pos[m-1]
        x = first_mov_pos - last_mov_pos
        mos0_y_dpos.append(x)
    
    for n in range(0,len(mos0_z_pos)):
        first_mov_pos=mos0_z_pos[n]
        last_mov_pos=mos0_z_pos[n-1]
        x = first_mov_pos - last_mov_pos
        mos0_z_dpos.append(x)
        
mos0_ids
mos0_dt
mos0_x_dpos 
mos0_y_dpos 
mos0_z_dpos       

time_pos=list(zip(mos0_ids, mos0_dt, mos0_x_dpos, mos0_y_dpos, mos0_z_dpos))                                                 
time_pos=pd.DataFrame(time_pos,columns=['mos_id','dtime', 'x_position', 'y_position','z_position'])               #  transform into a dataframe         
time_pos['x_velocity'] = time_pos['x_position']/time_pos['dtime']
time_pos['y_velocity'] = time_pos['y_position']/time_pos['dtime']
time_pos['z_velocity'] = time_pos['z_position']/time_pos['dtime']

time_pos['x_acceleration'] = time_pos['x_velocity']/time_pos['dtime']
time_pos['y_acceleration'] = time_pos['y_velocity']/time_pos['dtime']
time_pos['z_acceleration'] = time_pos['z_velocity']/time_pos['dtime']

time_pos=time_pos.groupby('mos_id')
time_pos = np.array(time_pos, dtype=object)    
time_pos

CodePudding user response:

You probably miss one "specific" behavior of range(). Your first very simplified loop will have only 252 values, instead of having 253

Try this out in console:
len(range(0,252)) -> 252

So I presume as it's nested arr (matrices), it looses lots of data according to several calculations it should make for every col/row. Solution:
for i in range(0, 253) or for i in range(len(data_array) 1)

I assume same happened to all of your provided for loops

  • Related