Home > Blockchain >  pivoting a single row dataframe where groupBy can not be applied
pivoting a single row dataframe where groupBy can not be applied

Time:11-26

I have a dataframe like this:

inputRecordSetCount inputRecordCount suspenseRecordCount
166 1216 10

I am trying to make it look like

operation value
inputRecordSetCount 166
inputRecordCount 1216
suspenseRecordCount 10

I tried pivot, but it needs a groupBy field. I dont have any groupBy field. I found some reference of Stack in Scala. But not sure, how to use it in PySpark. Any help would be appreciated. Thank you.

CodePudding user response:

You can use the stack() operation as mentioned in this tutorial.

Since there are 3 unique values, pass the size, and pair of label and column name:

stack(3, "inputRecordSetCount", inputRecordSetCount, "inputRecordCount", inputRecordCount, "suspenseRecordCount", suspenseRecordCount) as (operation, value)

Full example:

df = spark.createDataFrame(data=[[166,1216,10]], schema=['inputRecordSetCount','inputRecordCount','suspenseRecordCount'])

cols = [f'"{c}", {c}' for c in df.columns]
exprs = f"stack({len(cols)}, {', '.join(str(c) for c in cols)}) as (operation, value)"
df = df.selectExpr(exprs)

df.show()

 ------------------- ----- 
|          operation|value|
 ------------------- ----- 
|inputRecordSetCount|  166|
|   inputRecordCount| 1216|
|suspenseRecordCount|   10|
 ------------------- ----- 
  • Related