How do I transpose a dataframe with only one row and multiple column in pyspark?-CodePudding

I have dataframes with one row:

A B C D E
4 1 7 2 3

I would like to convert this to a dataframe with the following format:

Letter Number
A      4
B      1
C      7
D      2
E      3

CodePudding user response：

I did not find any built-in pyspark function in the docs, so I created a very simple basic function that does the job. Given that your dataframe df has only one row, you can use the following solution.

def my_transpose(df):
  
  # get values
  letter = df.columns
  number = list(df.take(1)[0].asDict().values())
  
  # combine values for a new Spark dataframe
  data = [[a, b] for a, b in zip(letter, number)]
  
  res = spark.createDataFrame(data, ['Letter', 'Number'])
  return res



my_transpose(df).show()

 ------ ------ 
|Letter|Number|
 ------ ------ 
|     A|     4|
|     B|     1|
|     C|     7|
|     D|     2|
|     E|     3|
 ------ ------

CodePudding user response：

All you need was to use stack function to unpivot data frame

# create data frame
df = spark.createDataFrame([(4,1,7,2,3)],("A", "B", "C", "D", "E"))

# apply stack function
df1 = df.selectExpr("stack(5, 'A', A, 'B', B, 'C', C, 'D', D, 'E', E) as (Letter, Number)")