I have the following problem. Let's say that I have a dataframe
called file_data
, which has 3 columns TS_ns
, VALUE_NUMBER
and Alias
VALUE_NUMBER Alias TS_ns
0 0.116000 Name_1 3
1 3.448000 Name_2 34
2 6.106000 Name_3 7
3 4.048000 Name_4 54
4 4.358000 Name_5 32
I would like to take its datapoints and add them to a new dataframe, called dataframe_var
, which is empty and has only a column called Channel
:
Channel
0 Name_1
1 Name_2
2 Name_3
3 Name_4
4 Name_5
In order to obtain this:
Channel 3 34 7 54 32
0 Name_1 116000 nan nan nan nan
1 Name_2 nan 3.448000 nan nan nan
2 Name_3 nan nan 6.106000 nan nan
3 Name_4 nan nan nan 4.048000 nan
4 Name_5 nan nan nan nan 4.358000
and possibly reorder the column names by increasing value.
The procedure I use is the following:
import pandas as pd
time_series = pd.Series( file_data.TS_ns )
value_series = pd.Series( file_data.VALUE_NUMBER )
alias_series = pd.Series( file_data.Alias )
for time_point, value_point, alias_point in zip( time_series, value_series, alias_series ):
dataframe_var.loc[ dataframe_var.loc[ dataframe_var[ "Channel" ] == alias_point ].index[0], time_point ] = value_point
The problem is that this line:
dataframe_var.loc[ dataframe_var.loc[ dataframe_var[ "Channel" ] == alias_point ].index[0], time_point ] = value_point
is really really slow and if I deal with medium (not so big) dataframes I must wait hours until it complete.
Do you know if there is a most efficient and faster way to add data to my dataframe? Thanks.
CodePudding user response:
You can achieve this using .pivot()
:
file_data.pivot(index='Alias', columns='TS_ns', values='VALUE_NUMBER')
TS_ns 3 7 32 34 54
Alias
Name_1 0.116 NaN NaN NaN NaN
Name_2 NaN NaN NaN 3.448 NaN
Name_3 NaN 6.106 NaN NaN NaN
Name_4 NaN NaN NaN NaN 4.048
Name_5 NaN NaN 4.358 NaN NaN
No need to use a for loop (this is generally very inefficient with pandas dataframes, there is almost always a function which is much faster). If dataframe_var is perhaps used to filter file_data in this process, you can merge the above output onto that to keep only desired Aliases:
dataframe_var.merge(file_data.pivot(index='Alias', columns='TS_ns',
values='VALUE_NUMBER'), on='Alias', how='left')