Home > database >  Transposing python/pyspark dataframe
Transposing python/pyspark dataframe

Time:12-15

I have data in excel file in the following format

enter image description here

I want it to be transposed into the following format

enter image description here

I have tried transposing the data using the following

pdf = df.toPandas()
df_transposed = pdf.T

But that didn't work and i get incorrect results... Any help please.. thanks

CodePudding user response:

For pandas , pd.melt

import pandas as pd
df = pd.DataFrame({
    'category': [1,2,3,4,5],
    'id1': [5,4,3,3,4],
    'id2': [3,2,4,3,10],
    'id3': [3, 2, 1, 1, 1]
})



   category  id1  id2  id3
0         1    5    3    3
1         2    4    2    2
2         3    3    4    1
3         4    3    3    1
4         5    4   10    1

    
pd.melt(df,id_vars=['category'], value_vars=['id1','id2','id3'],var_name='sequence', value_name='value')

For pyspark, construct a struct column of column names and their values and explode

df=spark.createDataFrame(df)
df.withColumn('tab', F.array(*[F.struct(lit(x).alias('sequence'), col(x).alias('value')).alias(x) for x in df.columns if x!='category'])).selectExpr('category','inline(tab)').show()


     category sequence  value
0          1      id1      5
1          2      id1      4
2          3      id1      3
3          4      id1      3
4          5      id1      4
5          1      id2      3
6          2      id2      2
7          3      id2      4
8          4      id2      3
9          5      id2     10
10         1      id3      3
11         2      id3      2
12         3      id3      1
13         4      id3      1
14         5      id3      1
  • Related