Home > Net >  Pandas series long to wide
Pandas series long to wide

Time:08-04

There are a lot of long_to_wide for dataframe but I can't find one for a sereis. This is my data

lst = ['Inference time: 58', 'class #0probability:3.20733', 'class #1probability:-3.73257', 'Inference time: 36', 'class #0probability:2.59028', 'class #1probability:-3.07627', 'Inference time: 34', 'class #0probability:2.19878', 'class #1probability:-2.58641', 'Inference time: 33', 'class #0probability:1.97092', 'class #1probability:-2.30133', 'Inference time: 34', 'class #0probability:2.25356', 'class #1probability:-2.63987', 'Inference time: 34', 'class #0probability:2.15755', 'class #1probability:-2.53912', 'Inference time: 33', 'class #0probability:2.2077', 'class #1probability:-2.56688', 'Inference time: 33', 'class #0probability:2.39328', 'class #1probability:-2.78342', 'Inference time: 34', 'class #0probability:2.64073', 'class #1probability:-3.09197', 'Inference time: 34', 'class #0probability:2.16264', 'class #1probability:-2.53414', 'Inference time: 34', 'class #0probability:2.06665', 'class #1probability:-2.42764', 'Inference time: 34', 'class #0probability:2.13249', 'class #1probability:-2.51025', 'Inference time: 33', 'class #0probability:2.33839', 'class #1probability:-2.72886', 'Inference time: 34', 'class #0probability:2.29761', 'class #1probability:-2.69393', 'Inference time: 34', 'class #0probability:2.32972', 'class #1probability:-2.74107', 'Inference time: 33', 'class #0probability:2.05837', 'class #1probability:-2.41768', 'Inference time: 33', 'class #0probability:1.77189', 'class #1probability:-2.05932', 'Inference time: 34', 'class #0probability:2.01318', 'class #1probability:-2.34344', 'Inference time: 34', 'class #0probability:2.29746', 'class #1probability:-2.70664', 'Inference time: 33', 'class #0probability:1.62522', 'class #1probability:-1.90069', 'Inference time: 34', 'class #0probability:1.76951', 'class #1probability:-2.0675', 'Inference time: 33', 'class #0probability:2.15999', 'class #1probability:-2.51604', 'Inference time: 34', 'class #0probability:1.85253', 'class #1probability:-2.16812', 'Inference time: 34', 'class #0probability:2.11363', 'class #1probability:-2.47437', 'Inference time: 33', 'class #0probability:2.16627', 'class #1probability:-2.53209', 'Inference time: 33', 'class #0probability:2.13652', 'class #1probability:-2.471', 'Inference time: 33', 'class #0probability:2.13441', 'class #1probability:-2.48205', 'Inference time: 34', 'class #0probability:1.88019', 'class #1probability:-2.18954', 'Inference time: 34', 'class #0probability:2.07553', 'class #1probability:-2.42014', 'Inference time: 33', 'class #0probability:2.53671', 'class #1probability:-2.97393', 'Inference time: 33', 'class #0probability:2.11061', 'class #1probability:-2.49532', 'Inference time: 34', 'class #0probability:2.13014', 'class #1probability:-2.49412', 'Inference time: 34', 'class #0probability:2.29957', 'class #1probability:-2.69449', 'Inference time: 33', 'class #0probability:2.26911', 'class #1probability:-2.63594', 'Inference time: 34', 'class #0probability:2.24174', 'class #1probability:-2.62847', 'Inference time: 33', 'class #0probability:2.21627', 'class #1probability:-2.5818', 'Inference time: 34', 'class #0probability:2.36658', 'class #1probability:-2.78529', 'Inference time: 34', 'class #0probability:2.30959', 'class #1probability:-2.68827', 'Inference time: 34', 'class #0probability:2.30189', 'class #1probability:-2.68324', 'Inference time: 34', 'class #0probability:2.61634', 'class #1probability:-3.09168', 'Inference time: 34', 'class #0probability:2.68012', 'class #1probability:-3.13164', 'Inference time: 34', 'class #0probability:2.66356', 'class #1probability:-3.12755', 'Inference time: 33', 'class #0probability:2.49246', 'class #1probability:-2.92267', 'Inference time: 33', 'class #0probability:2.44008', 'class #1probability:-2.85232', 'Inference time: 33', 'class #0probability:2.59968', 'class #1probability:-3.06841', 'Inference time: 33', 'class #0probability:2.58481', 'class #1probability:-3.01866', 'Inference time: 33', 'class #0probability:2.54606', 'class #1probability:-2.99243', 'Inference time: 33', 'class #0probability:2.42552', 'class #1probability:-2.85726', 'Inference time: 33', 'class #0probability:2.8681', 'class #1probability:-3.37495', 'Inference time: 34', 'class #0probability:2.49663', 'class #1probability:-2.91658', 'Inference time: 33', 'class #0probability:2.53856', 'class #1probability:-2.99136', 'Inference time: 34', 'class #0probability:2.63437', 'class #1probability:-3.10397', 'Inference time: 34', 'class #0probability:2.4746', 'class #1probability:-2.90436', 'Inference time: 34', 'class #0probability:2.36865', 'class #1probability:-2.76265', 'Inference time: 34', 'class #0probability:2.42615', 'class #1probability:-2.84323', 'Inference time: 33', 'class #0probability:2.462', 'class #1probability:-2.8738', 'Inference time: 33', 'class #0probability:2.41164', 'class #1probability:-2.7841', 'Inference time: 34', 'class #0probability:2.68366', 'class #1probability:-3.13177', 'Inference time: 34', 'class #0probability:2.29031', 'class #1probability:-2.6803', 'Inference time: 34', 'class #0probability:2.37769', 'class #1probability:-2.76857']

df = pd.DataFrame(lst, columns=['data'])

                             data
0              Inference time: 58
1     class #0probability:3.20733
2    class #1probability:-3.73257
3              Inference time: 36
4     class #0probability:2.59028
..                            ...
175   class #0probability:2.29031
176   class #1probability:-2.6803
177            Inference time: 34
178   class #0probability:2.37769
179  class #1probability:-2.76857

I want to parse it from long to wide so the expected output is:

    class0    class1 Inference_time
0  3.20733  -3.73257       58
1  3.20733  -3.73257       36
2  2.59028  -3.07627       34
3  2.19878  -2.58641       33

I know how to do this with a for loop since the batch is always 3; however I am looking for the pandas way.

CodePudding user response:

df = pd.DataFrame(lst, columns=['data'])
df[['key', 'value']] = pd.DataFrame(df['data'].str.split(':').tolist())
df['index'] = df.index// 3
output = df[['key', 'value', 'index']].pivot(columns='key', values='value', index='index') 

CodePudding user response:

Try this:

df = pd.DataFrame({'inference':[s.split(':')[1].strip() for s in lst if s.__contains__('Inference')],
                  'class0':[s.split(':')[1] for s in lst if s.__contains__('#0')],
                  'class1':[s.split(':')[1] for s in lst if s.__contains__('#1')]})

CodePudding user response:

you can do it like this:

First split your Series into two columns and change the index a bit.

df[['data', 'vals']] = df['data'].str.split(':', expand=True)
df.index = df.index//3
# index will look like: # 0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, ....)
  1. df.pivot :
out = (df
       .pivot(columns='data', values='vals')
       .rename(columns={'class #0probability' : 'class 0', 
                        'class #1probability' : 'class 1', 
                        'Inference time' : 'Inference_time'})
       .rename_axis(columns=None))
print(out)

or 2) df.set_index() and df.unstack() :

out = (df
       .set_index('data',append=True)
       .unstack()
       .droplevel(level=0, axis=1)
       .rename(columns={'class #0probability' : 'class 0', 
                        'class #1probability' : 'class 1', 
                        'Inference time' : 'Inference_time'})
       .rename_axis(columns=None))

print(out)

Output out:

  Inference_time  class 0   class 1
0             58  3.20733  -3.73257
1             36  2.59028  -3.07627
2             34  2.19878  -2.58641
3             33  1.97092  -2.30133
4             34  2.25356  -2.63987

CodePudding user response:

# split the cells into text and values
df2[['col','value']] =df['data'].str.split(':', expand=True)

# remove probablity from the text columns
df2['col']=df2['col'].str.strip().str.replace(r'probability','')

# pivot the resulting dataframe to create the table
df3=df2.pivot(  columns='col' )

#name the columns based on teh row values under text columns
df3.columns = pd.Categorical(df2['col']).categories

#values are staggered with values only in one column. This is to assign the inference time to the rows below
df3['Inference time'].ffill(inplace=True)

#Finally, get rid of the null values
df3.groupby('Inference time').sum().reset_index()

  Inference time    class #0    class #1
0             34    2.37769     -2.76857
1             36    2.29031     -2.6803
2             58    3.20733     -3.73257
  • Related