I have an instrument that gives me averaged data per minute in every row with raw time series in 60 columns for every second, let's say 3 seconds (columns) here to make it easier.
Filename Rec_number Average 1s 2s 3s
0 type1 1 3 2 3 4
1 type1 2 2 1 2 3
2 type2 1 1 1 1 1
3 type2 2 5 4 5 6
4 type2 3 4 3 4 5
I would like to merge the time series to have a single timeline per file like this:
Filename 1s 2s 3s 4s 5s 6s 7s 8s 9s
0 type1 2 3 4 1 2 3 Nan Nan Nan
1 type2 1 1 1 4 5 6 3 4 5
What would be a good way to do that? Thank you for your help in advance!
CodePudding user response:
We can set_index
stack
to get to long form, then create enumerated groups relative to the filenames using groupby cumcount
lastly pivot
back to wide format. Some additional cleanup with add_suffix
, reset_index
and rename_axis
can be done to move to exact expected output:
new_df = (
df.set_index('Filename').loc[:, '1s':].stack()
.droplevel(1)
.reset_index(name='values')
.assign(cols=lambda s: s.groupby('Filename').cumcount() 1)
.pivot(index='Filename', columns='cols', values='values')
.add_suffix('s')
.reset_index()
.rename_axis(columns=None)
)
new_df
:
Filename 1s 2s 3s 4s 5s 6s 7s 8s 9s
0 type1 2.0 3.0 4.0 1.0 2.0 3.0 NaN NaN NaN
1 type2 1.0 1.0 1.0 4.0 5.0 6.0 3.0 4.0 5.0
*inline comments to explain steps:
new_df = (
df.set_index('Filename') # Maintain Filename
.loc[:, '1s':] # Slice all columns from `1s` to the end of the frame
.stack() # Go to long format
.droplevel(1) # Remove old column headers
.reset_index(name='values') # Create DataFrame
# Create new Column headers
.assign(cols=lambda s: s.groupby('Filename').cumcount() 1)
# Pivot back to wide format
.pivot(index='Filename', columns='cols', values='values')
# Add s to the end of column headers
.add_suffix('s')
# Restore default Range Index and make FileName a column
.reset_index()
# Remove Axis label from columns (Created by pivoting)
.rename_axis(columns=None)
)
Setup and imports:
import pandas as pd
df = pd.DataFrame({
'Filename': ['type1', 'type1', 'type2', 'type2', 'type2'],
'Rec_number': [1, 2, 1, 2, 3],
'Average': [3, 2, 1, 5, 4],
'1s': [2, 1, 1, 4, 3],
'2s': [3, 2, 1, 5, 4],
'3s': [4, 3, 1, 6, 5]
})
CodePudding user response:
This tries to avoid flipping to long form, and hopefully improve performance, since only one reshape (wide form) is applied :
box = (df.filter(regex= r".*(name|s)$")
.assign(counter = df.groupby('Filename').cumcount())
.pivot('Filename', 'counter')
.sort_index(level = 'counter', axis = 1)
.droplevel(level='counter', axis = 1)
)
box.columns = [f"{num}s" for num in range(1, box.columns.size 1)]
box.reset_index()
Filename 1s 2s 3s 4s 5s 6s 7s 8s 9s
0 type1 2.0 3.0 4.0 1.0 2.0 3.0 NaN NaN NaN
1 type2 1.0 1.0 1.0 4.0 5.0 6.0 3.0 4.0 5.0