I have a dataframe like this:
inputRecordSetCount | inputRecordCount | suspenseRecordCount |
---|---|---|
166 | 1216 | 10 |
I am trying to make it look like
operation | value |
---|---|
inputRecordSetCount | 166 |
inputRecordCount | 1216 |
suspenseRecordCount | 10 |
I tried pivot
, but it needs a groupBy
field. I dont have any groupBy
field. I found some reference of Stack
in Scala. But not sure, how to use it in PySpark. Any help would be appreciated. Thank you.
CodePudding user response:
You can use the stack()
operation as mentioned in this tutorial.
Since there are 3 unique values, pass the size, and pair of label and column name:
stack(3, "inputRecordSetCount", inputRecordSetCount, "inputRecordCount", inputRecordCount, "suspenseRecordCount", suspenseRecordCount) as (operation, value)
Full example:
df = spark.createDataFrame(data=[[166,1216,10]], schema=['inputRecordSetCount','inputRecordCount','suspenseRecordCount'])
cols = [f'"{c}", {c}' for c in df.columns]
exprs = f"stack({len(cols)}, {', '.join(str(c) for c in cols)}) as (operation, value)"
df = df.selectExpr(exprs)
df.show()
------------------- -----
| operation|value|
------------------- -----
|inputRecordSetCount| 166|
| inputRecordCount| 1216|
|suspenseRecordCount| 10|
------------------- -----