Home > Net >  Getting the values from dataframe column, where the data are stored in a long nested lists
Getting the values from dataframe column, where the data are stored in a long nested lists

Time:07-17

I am little bit new to python. I have loaded a .mat file into pandas DataFrame but there is one column sensor where the values are stored as a list of lists (nested list). Each list contains one time stamp and the reading at that time. All what I need is to get all the sensor readings alone in a new list without the time stamp. The attached photo is a simplified example to what I have. sensor_data_without_timestamp.

As described in the photo, I have a for loop to iterate for each entry in sensor column and get the sensor reading to the new column sensor_data_without_timestamp. My data set is really big (630 rows and for each raw the length of the list is approx. 30.000) and this for loop alone takes about 10 minutes running.

Isn't there any other method to get the sensor data more effectively ?

Thanks in Advance :)

here is the code too:

df = pd.DataFrame({'id':['1A', '2A', '3A'],
                   'sensor': [[[0,10], [2,15], [4,20]], [[0,7], [2,14], [4,18]], [[0,11], [2,16], [4,22]]]})

sensor_data_without_timestamp = []
for i in range(len(df)):
    l = []
    for a in range(len(df['sensor'][i])):
        l.append(df['sensor'][i][a][1])
    sensor_data_without_timestamp.append(l)

df['sensor_data_without_timestamp'] = sensor_data_without_timestamp
df.head()

CodePudding user response:

You can explode your dataframe then extract the second values and finally reshape your dataframe:

df['sensor_data_without_timestamp'] = \
    df['sensor'].explode().str[1].groupby(level=0).agg(list)
print(df)

# Output
   id                       sensor sensor_data_without_timestamp
0  1A  [[0, 10], [2, 15], [4, 20]]                  [10, 15, 20]
1  2A   [[0, 7], [2, 14], [4, 18]]                   [7, 14, 18]
2  3A  [[0, 11], [2, 16], [4, 22]]                  [11, 16, 22]

Update If you have always a list of 3 elements:

df['sensor_data_without_timestamp'] = \
    np.vstack(df['sensor'])[:, 1].reshape(-1, 3).tolist()
print(df)

# Output
   id                       sensor sensor_data_without_timestamp
0  1A  [[0, 10], [2, 15], [4, 20]]                  [10, 15, 20]
1  2A   [[0, 7], [2, 14], [4, 18]]                   [7, 14, 18]
2  3A  [[0, 11], [2, 16], [4, 22]]                  [11, 16, 22]
  • Related