Concatenate multiple rows of specific columns into one row pandas-CodePudding

I have an instrument that gives me averaged data per minute in every row with raw time series in 60 columns for every second, let's say 3 seconds (columns) here to make it easier.

  Filename  Rec_number  Average  1s  2s  3s
0    type1           1        3   2   3   4
1    type1           2        2   1   2   3
2    type2           1        1   1   1   1
3    type2           2        5   4   5   6
4    type2           3        4   3   4   5

I would like to merge the time series to have a single timeline per file like this:

  Filename  1s  2s  3s  4s  5s  6s   7s   8s   9s
0    type1   2   3   4   1   2   3  Nan  Nan  Nan
1    type2   1   1   1   4   5   6    3    4    5

What would be a good way to do that? Thank you for your help in advance!

CodePudding user response：

We can set_index stack to get to long form, then create enumerated groups relative to the filenames using groupby cumcount lastly pivot back to wide format. Some additional cleanup with add_suffix, reset_index and rename_axis can be done to move to exact expected output:

new_df = (
    df.set_index('Filename').loc[:, '1s':].stack()
        .droplevel(1)
        .reset_index(name='values')
        .assign(cols=lambda s: s.groupby('Filename').cumcount()   1)
        .pivot(index='Filename', columns='cols', values='values')
        .add_suffix('s')
        .reset_index()
        .rename_axis(columns=None)
)

new_df:

  Filename   1s   2s   3s   4s   5s   6s   7s   8s   9s
0    type1  2.0  3.0  4.0  1.0  2.0  3.0  NaN  NaN  NaN
1    type2  1.0  1.0  1.0  4.0  5.0  6.0  3.0  4.0  5.0

*inline comments to explain steps:

new_df = (
    df.set_index('Filename')  # Maintain Filename
        .loc[:, '1s':]  # Slice all columns from `1s` to the end of the frame
        .stack()  # Go to long format
        .droplevel(1)  # Remove old column headers
        .reset_index(name='values')  # Create DataFrame
        # Create new Column headers
        .assign(cols=lambda s: s.groupby('Filename').cumcount()   1)
        # Pivot back to wide format
        .pivot(index='Filename', columns='cols', values='values')
        # Add s to the end of column headers
        .add_suffix('s')
        # Restore default Range Index and make FileName a column
        .reset_index()
        # Remove Axis label from columns (Created by pivoting)
        .rename_axis(columns=None)
)

Setup and imports:

import pandas as pd

df = pd.DataFrame({
    'Filename': ['type1', 'type1', 'type2', 'type2', 'type2'],
    'Rec_number': [1, 2, 1, 2, 3], 
    'Average': [3, 2, 1, 5, 4],
    '1s': [2, 1, 1, 4, 3], 
    '2s': [3, 2, 1, 5, 4], 
    '3s': [4, 3, 1, 6, 5]
})

CodePudding user response：

This tries to avoid flipping to long form, and hopefully improve performance, since only one reshape (wide form) is applied :

box = (df.filter(regex= r".*(name|s)$")
         .assign(counter = df.groupby('Filename').cumcount())
         .pivot('Filename', 'counter')
         .sort_index(level = 'counter', axis = 1)
         .droplevel(level='counter', axis = 1)
      )

box.columns = [f"{num}s" for num in range(1, box.columns.size   1)]

box.reset_index()

  Filename   1s   2s   3s   4s   5s   6s   7s   8s   9s
0    type1  2.0  3.0  4.0  1.0  2.0  3.0  NaN  NaN  NaN
1    type2  1.0  1.0  1.0  4.0  5.0  6.0  3.0  4.0  5.0