Home > Back-end >  Adding data points to my dataframe is too slow, is there a faster way with Pandas?
Adding data points to my dataframe is too slow, is there a faster way with Pandas?

Time:07-01

I have the following problem. Let's say that I have a dataframe called file_data, which has 3 columns TS_ns, VALUE_NUMBER and Alias

         VALUE_NUMBER    Alias     TS_ns
0        0.116000        Name_1    3
1        3.448000        Name_2    34
2        6.106000        Name_3    7
3        4.048000        Name_4    54
4        4.358000        Name_5    32

I would like to take its datapoints and add them to a new dataframe, called dataframe_var, which is empty and has only a column called Channel:

         Channel
0        Name_1
1        Name_2
2        Name_3
3        Name_4
4        Name_5

In order to obtain this:

         Channel    3       34       7        54       32
0        Name_1     116000  nan      nan      nan      nan
1        Name_2     nan     3.448000 nan      nan      nan
2        Name_3     nan     nan      6.106000 nan      nan
3        Name_4     nan     nan      nan      4.048000 nan
4        Name_5     nan     nan      nan      nan      4.358000

and possibly reorder the column names by increasing value.

The procedure I use is the following:

import pandas as pd

time_series = pd.Series( file_data.TS_ns )
value_series = pd.Series( file_data.VALUE_NUMBER )
alias_series = pd.Series( file_data.Alias )

for time_point, value_point, alias_point in zip( time_series, value_series, alias_series ):
    dataframe_var.loc[ dataframe_var.loc[ dataframe_var[ "Channel" ] == alias_point ].index[0], time_point ] = value_point

The problem is that this line:

dataframe_var.loc[ dataframe_var.loc[ dataframe_var[ "Channel" ] == alias_point ].index[0], time_point ] = value_point

is really really slow and if I deal with medium (not so big) dataframes I must wait hours until it complete.

Do you know if there is a most efficient and faster way to add data to my dataframe? Thanks.

CodePudding user response:

You can achieve this using .pivot():

file_data.pivot(index='Alias', columns='TS_ns', values='VALUE_NUMBER')

TS_ns      3      7      32     34     54
Alias                                    
Name_1  0.116    NaN    NaN    NaN    NaN
Name_2    NaN    NaN    NaN  3.448    NaN
Name_3    NaN  6.106    NaN    NaN    NaN
Name_4    NaN    NaN    NaN    NaN  4.048
Name_5    NaN    NaN  4.358    NaN    NaN

No need to use a for loop (this is generally very inefficient with pandas dataframes, there is almost always a function which is much faster). If dataframe_var is perhaps used to filter file_data in this process, you can merge the above output onto that to keep only desired Aliases:

 dataframe_var.merge(file_data.pivot(index='Alias', columns='TS_ns', 
 values='VALUE_NUMBER'), on='Alias', how='left')
  • Related