I have a table like this
device_type version pool testMean testP50 testP90 testP99 testStd WidgetMean WidgetP50 WidgetP90 WidgetP99 WidgetStd
PNB0Q7 8108162 123 124 136 140.8 141.88 21.35 2.2 0 6.4 9.64 3.92
I want it to be transformed like this:
device_type version pool Name Mean P50 P90 P99 Std
PNB0Q7 8108162 123 test 123 136 140.8 142.88 21.35
PNB0Q7 8108162 123 Widget 2.2 0 6.4 9.64 3.92
I tried using melt but getting:
df.melt(id_vars=["device_type", "version", "pool"], var_name="Name", value_name="Value")
device_type version pool Name Value
PNB0Q7 8108162 test testMean 124.00
PNB0Q7 8108162 test testP50 136.00
PNB0Q7 8108162 test testP90 140.80
PNB0Q7 8108162 test testP99 141.88
PNB0Q7 8108162 test testStd 21.35
Any idea on how to reach to expected solution
CodePudding user response:
You can do this with pd.wide_to_long
and a little column naming cleanup first, then reshape:
df = df.rename(columns={'Std':'testStd',
'TestP90':'testP90',
'TestP99':'testP99',
'TestP50':'testP50'})
df_out = pd.wide_to_long(df,
['test','Widget'],
['device_type', 'version', 'pool'],
'Measure', '', '. ' )
df_out = df_out.unstack(-1).stack(0).reset_index()
df_out
Output:
Measure device_type version pool level_3 Mean P50 P90 P99 Std
0 PNB0Q7 8108162 123 Widget 2.2 0.0 6.4 9.64 3.92
1 PNB0Q7 8108162 123 test 124.0 136.0 140.8 141.88 21.35
CodePudding user response:
df.columns = ['device_type', 'version', 'pool', 'Mean', 'P50', 'P90', 'P99', 'Std']
df['Name'] = 'test'
df = df[['device_type', 'version', 'pool', 'Name', 'Mean', 'P50', 'P90', 'P99', 'Std']]
print(df)
Output:
device_type version pool Name Mean P50 P90 P99 Std
0 PNB0Q7 8108162 123 test 124 136 140.8 141.88 21.35
CodePudding user response:
One option is to transform to long form with pivot_longer from pyjanitor, using the .value
placeholder ---> the .value
determines which parts of the columns remain as headers. First we need to ensure that Test
is lowercase:
# pip install pyjanitor
import pandas as pd
import janitor
df.columns = df.columns.str.replace('Test', 'test')
df
device_type version pool testMean testP50 testP90 testP99 Std
0 PNB0Q7 8108162 123 124 136 140.8 141.88 21.35
df.pivot_longer(
column_names = 'test*',
names_to = ('Name', '.value'),
names_pattern = r"(test)(. )"
)
device_type version pool Std Name Mean P50 P90 P99
0 PNB0Q7 8108162 123 21.35 test 124 136 140.8 141.88
With the updated data, the same concept applies; however, you need to arrange your columns properly - get the Test
as lowercase, change Std
to testStd
:
df.columns = df.columns.str.replace('Test', 'test')
df = df.rename(columns = {'Std': 'testStd'})
df
device_type version pool testMean testP50 testP90 testP99 testStd WidgetMean WidgetP50 WidgetP90 WidgetP99 WidgetStd
0 PNB0Q7 8108162 123 124 136 140.8 141.88 21.35 2.2 0 6.4 9.64 3.92
df.pivot_longer(
column_names = ['test*', 'Widget*'],
names_to = ('Name', '.value'),
names_pattern = r"(test|Widget)(. )"
)
device_type version pool Name Mean P50 P90 P99 Std
0 PNB0Q7 8108162 123 test 124.0 136 140.8 141.88 21.35
1 PNB0Q7 8108162 123 Widget 2.2 0 6.4 9.64 3.92