I have a list of fields an am trying to create an unpivot expression with stack() in pyspark.
stack() requires the params: number, field name, then field value.
stack(30, 'field1', field1...)
I have a list of lists like
[['field1','field1'],['field2','field2']...]
I then can make a single list
['field1','field1','field2','field2']
But i need to remove the single quotes from the second occurence, so it works as the "field value"
unpivot_Expr = "stack(30, 'field1',field1,'field2',field2...)"
So far i'm getting stack(30, 'field1','field1','field2','field2'...)
I'm not sure how, or which is the easiest place to remove the single quotes? Any help is much appreciated.
Edit: Sorry should've given context, I need to insert this string into a pyspark select expression
unpivot_df = df.select("hashKey", expr(unpivot_Expr))
Currently I drop the list into the string and replace the [] like this
unpivot_Expr = "stack({0}, {1})".format(str(len(fieldList)), str(fieldList).replace("[","").replace("]",""))
CodePudding user response:
How about building up the string unpivot_Expr
piece by piece via:
all_fields = [
['field1','field1'],
['field2','field2']
]
unpivot_Expr = "stack(30"
for pair in all_fields:
unpivot_Expr = f", '{pair[0]}', {pair[1]}"
unpivot_Expr = ")"
print(unpivot_Expr)
I think that will give you the tring you seek:
stack(30, 'field1', field1, 'field2', field2)