pivoting a single row dataframe where groupBy can not be applied-CodePudding

I have a dataframe like this:

inputRecordSetCount	inputRecordCount	suspenseRecordCount
166	1216	10

I am trying to make it look like

operation	value
inputRecordSetCount	166
inputRecordCount	1216
suspenseRecordCount	10

I tried pivot, but it needs a groupBy field. I dont have any groupBy field. I found some reference of Stack in Scala. But not sure, how to use it in PySpark. Any help would be appreciated. Thank you.

CodePudding user response：

You can use the stack() operation as mentioned in this tutorial.

Since there are 3 unique values, pass the size, and pair of label and column name:

stack(3, "inputRecordSetCount", inputRecordSetCount, "inputRecordCount", inputRecordCount, "suspenseRecordCount", suspenseRecordCount) as (operation, value)

Full example:

df = spark.createDataFrame(data=[[166,1216,10]], schema=['inputRecordSetCount','inputRecordCount','suspenseRecordCount'])

cols = [f'"{c}", {c}' for c in df.columns]
exprs = f"stack({len(cols)}, {', '.join(str(c) for c in cols)}) as (operation, value)"
df = df.selectExpr(exprs)

df.show()

 ------------------- ----- 
|          operation|value|
 ------------------- ----- 
|inputRecordSetCount|  166|
|   inputRecordCount| 1216|
|suspenseRecordCount|   10|
 ------------------- -----