Home > database >  How do I melt a pandas with custom nam
How do I melt a pandas with custom nam

Time:05-13

I have a table like this

device_type version pool    testMean    testP50 testP90 testP99 testStd  WidgetMean WidgetP50   WidgetP90   WidgetP99   WidgetStd
PNB0Q7      8108162 123     124         136     140.8   141.88  21.35    2.2            0           6.4         9.64        3.92

I want it to be transformed like this:

device_type version pool   Name  Mean P50 P90   P99    Std
PNB0Q7      8108162 123    test  123  136 140.8 142.88 21.35
PNB0Q7      8108162 123   Widget 2.2   0  6.4   9.64  3.92

I tried using melt but getting:

df.melt(id_vars=["device_type", "version", "pool"], var_name="Name", value_name="Value")
device_type  version     pool  Name        Value
PNB0Q7       8108162     test  testMean     124.00
PNB0Q7       8108162     test  testP50      136.00
PNB0Q7       8108162     test  testP90      140.80
PNB0Q7       8108162     test  testP99      141.88
PNB0Q7       8108162     test  testStd      21.35

Any idea on how to reach to expected solution

CodePudding user response:

You can do this with pd.wide_to_long and a little column naming cleanup first, then reshape:

df = df.rename(columns={'Std':'testStd',
                        'TestP90':'testP90',
                        'TestP99':'testP99', 
                        'TestP50':'testP50'})
df_out = pd.wide_to_long(df, 
                         ['test','Widget'], 
                         ['device_type', 'version', 'pool'], 
                         'Measure', '', '. ' )
df_out = df_out.unstack(-1).stack(0).reset_index()
df_out

Output:

Measure device_type  version  pool level_3   Mean    P50    P90     P99    Std
0            PNB0Q7  8108162   123  Widget    2.2    0.0    6.4    9.64   3.92
1            PNB0Q7  8108162   123    test  124.0  136.0  140.8  141.88  21.35

CodePudding user response:

df.columns = ['device_type', 'version', 'pool', 'Mean', 'P50', 'P90', 'P99', 'Std']
df['Name'] = 'test'
df = df[['device_type', 'version', 'pool', 'Name', 'Mean', 'P50', 'P90', 'P99', 'Std']]
print(df)

Output:

  device_type  version  pool  Name  Mean  P50    P90     P99    Std
0      PNB0Q7  8108162   123  test   124  136  140.8  141.88  21.35

CodePudding user response:

One option is to transform to long form with pivot_longer from pyjanitor, using the .value placeholder ---> the .value determines which parts of the columns remain as headers. First we need to ensure that Test is lowercase:

# pip install pyjanitor
import pandas as pd
import janitor

df.columns = df.columns.str.replace('Test', 'test')

df

  device_type  version  pool  testMean  testP50  testP90  testP99    Std
0      PNB0Q7  8108162   123       124      136    140.8   141.88  21.35

df.pivot_longer(
     column_names = 'test*', 
     names_to = ('Name', '.value'), 
    names_pattern = r"(test)(. )"
   )

  device_type  version  pool    Std  Name  Mean  P50    P90     P99
0      PNB0Q7  8108162   123  21.35  test   124  136  140.8  141.88

With the updated data, the same concept applies; however, you need to arrange your columns properly - get the Test as lowercase, change Std to testStd:

df.columns = df.columns.str.replace('Test', 'test')
df = df.rename(columns = {'Std': 'testStd'})
df

  device_type  version  pool  testMean  testP50  testP90  testP99  testStd  WidgetMean  WidgetP50  WidgetP90  WidgetP99  WidgetStd
0      PNB0Q7  8108162   123       124      136    140.8   141.88    21.35         2.2          0        6.4       9.64       3.92

df.pivot_longer(
     column_names = ['test*', 'Widget*'], 
     names_to = ('Name', '.value'), 
     names_pattern = r"(test|Widget)(. )"
    )

  device_type  version  pool    Name   Mean  P50    P90     P99    Std
0      PNB0Q7  8108162   123    test  124.0  136  140.8  141.88  21.35
1      PNB0Q7  8108162   123  Widget    2.2    0    6.4    9.64   3.92
  • Related