I have dataframes with one row:
A B C D E
4 1 7 2 3
I would like to convert this to a dataframe with the following format:
Letter Number
A 4
B 1
C 7
D 2
E 3
CodePudding user response:
I did not find any built-in pyspark function in the docs, so I created a very simple basic function that does the job. Given that your dataframe df
has only one row, you can use the following solution.
def my_transpose(df):
# get values
letter = df.columns
number = list(df.take(1)[0].asDict().values())
# combine values for a new Spark dataframe
data = [[a, b] for a, b in zip(letter, number)]
res = spark.createDataFrame(data, ['Letter', 'Number'])
return res
my_transpose(df).show()
------ ------
|Letter|Number|
------ ------
| A| 4|
| B| 1|
| C| 7|
| D| 2|
| E| 3|
------ ------
CodePudding user response:
All you need was to use stack
function to unpivot
data frame
# create data frame
df = spark.createDataFrame([(4,1,7,2,3)],("A", "B", "C", "D", "E"))
# apply stack function
df1 = df.selectExpr("stack(5, 'A', A, 'B', B, 'C', C, 'D', D, 'E', E) as (Letter, Number)")