There are a lot of long_to_wide for dataframe but I can't find one for a sereis. This is my data
lst = ['Inference time: 58', 'class #0probability:3.20733', 'class #1probability:-3.73257', 'Inference time: 36', 'class #0probability:2.59028', 'class #1probability:-3.07627', 'Inference time: 34', 'class #0probability:2.19878', 'class #1probability:-2.58641', 'Inference time: 33', 'class #0probability:1.97092', 'class #1probability:-2.30133', 'Inference time: 34', 'class #0probability:2.25356', 'class #1probability:-2.63987', 'Inference time: 34', 'class #0probability:2.15755', 'class #1probability:-2.53912', 'Inference time: 33', 'class #0probability:2.2077', 'class #1probability:-2.56688', 'Inference time: 33', 'class #0probability:2.39328', 'class #1probability:-2.78342', 'Inference time: 34', 'class #0probability:2.64073', 'class #1probability:-3.09197', 'Inference time: 34', 'class #0probability:2.16264', 'class #1probability:-2.53414', 'Inference time: 34', 'class #0probability:2.06665', 'class #1probability:-2.42764', 'Inference time: 34', 'class #0probability:2.13249', 'class #1probability:-2.51025', 'Inference time: 33', 'class #0probability:2.33839', 'class #1probability:-2.72886', 'Inference time: 34', 'class #0probability:2.29761', 'class #1probability:-2.69393', 'Inference time: 34', 'class #0probability:2.32972', 'class #1probability:-2.74107', 'Inference time: 33', 'class #0probability:2.05837', 'class #1probability:-2.41768', 'Inference time: 33', 'class #0probability:1.77189', 'class #1probability:-2.05932', 'Inference time: 34', 'class #0probability:2.01318', 'class #1probability:-2.34344', 'Inference time: 34', 'class #0probability:2.29746', 'class #1probability:-2.70664', 'Inference time: 33', 'class #0probability:1.62522', 'class #1probability:-1.90069', 'Inference time: 34', 'class #0probability:1.76951', 'class #1probability:-2.0675', 'Inference time: 33', 'class #0probability:2.15999', 'class #1probability:-2.51604', 'Inference time: 34', 'class #0probability:1.85253', 'class #1probability:-2.16812', 'Inference time: 34', 'class #0probability:2.11363', 'class #1probability:-2.47437', 'Inference time: 33', 'class #0probability:2.16627', 'class #1probability:-2.53209', 'Inference time: 33', 'class #0probability:2.13652', 'class #1probability:-2.471', 'Inference time: 33', 'class #0probability:2.13441', 'class #1probability:-2.48205', 'Inference time: 34', 'class #0probability:1.88019', 'class #1probability:-2.18954', 'Inference time: 34', 'class #0probability:2.07553', 'class #1probability:-2.42014', 'Inference time: 33', 'class #0probability:2.53671', 'class #1probability:-2.97393', 'Inference time: 33', 'class #0probability:2.11061', 'class #1probability:-2.49532', 'Inference time: 34', 'class #0probability:2.13014', 'class #1probability:-2.49412', 'Inference time: 34', 'class #0probability:2.29957', 'class #1probability:-2.69449', 'Inference time: 33', 'class #0probability:2.26911', 'class #1probability:-2.63594', 'Inference time: 34', 'class #0probability:2.24174', 'class #1probability:-2.62847', 'Inference time: 33', 'class #0probability:2.21627', 'class #1probability:-2.5818', 'Inference time: 34', 'class #0probability:2.36658', 'class #1probability:-2.78529', 'Inference time: 34', 'class #0probability:2.30959', 'class #1probability:-2.68827', 'Inference time: 34', 'class #0probability:2.30189', 'class #1probability:-2.68324', 'Inference time: 34', 'class #0probability:2.61634', 'class #1probability:-3.09168', 'Inference time: 34', 'class #0probability:2.68012', 'class #1probability:-3.13164', 'Inference time: 34', 'class #0probability:2.66356', 'class #1probability:-3.12755', 'Inference time: 33', 'class #0probability:2.49246', 'class #1probability:-2.92267', 'Inference time: 33', 'class #0probability:2.44008', 'class #1probability:-2.85232', 'Inference time: 33', 'class #0probability:2.59968', 'class #1probability:-3.06841', 'Inference time: 33', 'class #0probability:2.58481', 'class #1probability:-3.01866', 'Inference time: 33', 'class #0probability:2.54606', 'class #1probability:-2.99243', 'Inference time: 33', 'class #0probability:2.42552', 'class #1probability:-2.85726', 'Inference time: 33', 'class #0probability:2.8681', 'class #1probability:-3.37495', 'Inference time: 34', 'class #0probability:2.49663', 'class #1probability:-2.91658', 'Inference time: 33', 'class #0probability:2.53856', 'class #1probability:-2.99136', 'Inference time: 34', 'class #0probability:2.63437', 'class #1probability:-3.10397', 'Inference time: 34', 'class #0probability:2.4746', 'class #1probability:-2.90436', 'Inference time: 34', 'class #0probability:2.36865', 'class #1probability:-2.76265', 'Inference time: 34', 'class #0probability:2.42615', 'class #1probability:-2.84323', 'Inference time: 33', 'class #0probability:2.462', 'class #1probability:-2.8738', 'Inference time: 33', 'class #0probability:2.41164', 'class #1probability:-2.7841', 'Inference time: 34', 'class #0probability:2.68366', 'class #1probability:-3.13177', 'Inference time: 34', 'class #0probability:2.29031', 'class #1probability:-2.6803', 'Inference time: 34', 'class #0probability:2.37769', 'class #1probability:-2.76857']
df = pd.DataFrame(lst, columns=['data'])
data
0 Inference time: 58
1 class #0probability:3.20733
2 class #1probability:-3.73257
3 Inference time: 36
4 class #0probability:2.59028
.. ...
175 class #0probability:2.29031
176 class #1probability:-2.6803
177 Inference time: 34
178 class #0probability:2.37769
179 class #1probability:-2.76857
I want to parse it from long to wide so the expected output is:
class0 class1 Inference_time
0 3.20733 -3.73257 58
1 3.20733 -3.73257 36
2 2.59028 -3.07627 34
3 2.19878 -2.58641 33
I know how to do this with a for loop since the batch is always 3; however I am looking for the pandas way.
CodePudding user response:
df = pd.DataFrame(lst, columns=['data'])
df[['key', 'value']] = pd.DataFrame(df['data'].str.split(':').tolist())
df['index'] = df.index// 3
output = df[['key', 'value', 'index']].pivot(columns='key', values='value', index='index')
CodePudding user response:
Try this:
df = pd.DataFrame({'inference':[s.split(':')[1].strip() for s in lst if s.__contains__('Inference')],
'class0':[s.split(':')[1] for s in lst if s.__contains__('#0')],
'class1':[s.split(':')[1] for s in lst if s.__contains__('#1')]})
CodePudding user response:
you can do it like this:
First split your Series into two columns and change the index a bit.
df[['data', 'vals']] = df['data'].str.split(':', expand=True)
df.index = df.index//3
# index will look like: # 0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, ....)
df.pivot
:
out = (df
.pivot(columns='data', values='vals')
.rename(columns={'class #0probability' : 'class 0',
'class #1probability' : 'class 1',
'Inference time' : 'Inference_time'})
.rename_axis(columns=None))
print(out)
or 2) df.set_index()
and df.unstack()
:
out = (df
.set_index('data',append=True)
.unstack()
.droplevel(level=0, axis=1)
.rename(columns={'class #0probability' : 'class 0',
'class #1probability' : 'class 1',
'Inference time' : 'Inference_time'})
.rename_axis(columns=None))
print(out)
Output out
:
Inference_time class 0 class 1
0 58 3.20733 -3.73257
1 36 2.59028 -3.07627
2 34 2.19878 -2.58641
3 33 1.97092 -2.30133
4 34 2.25356 -2.63987
CodePudding user response:
# split the cells into text and values
df2[['col','value']] =df['data'].str.split(':', expand=True)
# remove probablity from the text columns
df2['col']=df2['col'].str.strip().str.replace(r'probability','')
# pivot the resulting dataframe to create the table
df3=df2.pivot( columns='col' )
#name the columns based on teh row values under text columns
df3.columns = pd.Categorical(df2['col']).categories
#values are staggered with values only in one column. This is to assign the inference time to the rows below
df3['Inference time'].ffill(inplace=True)
#Finally, get rid of the null values
df3.groupby('Inference time').sum().reset_index()
Inference time class #0 class #1
0 34 2.37769 -2.76857
1 36 2.29031 -2.6803
2 58 3.20733 -3.73257