I would like to convert multiple array time columns in a dataframe to string. Can someone please help?
I have dataframewith different types of element.Some number/some array. I want to convert only array columns to string and the rest should be as it is.
Expected Output: Expected Output:
CodePudding user response:
You can use array_join
transformation in pyspark. https://spark.apache.org/docs/3.1.3/api/python/reference/api/pyspark.sql.functions.array_join.html#pyspark.sql.functions.array_join
CodePudding user response:
You should use the .apply(method) attribute to your DataFrame.
For exemple, if you want to combine the letters in the arrays in your column1:
df["result"] = df["column1"].apply(lambda x: "".join(x))
But you can do whatever you want with the method as long as you can make a function that take a cell as an input and output, your desired result.
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.apply.html
CodePudding user response:
I used something like this and that gave me the results: selectionColumns = [F.coalesce(i[0], F.array()).alias(i[0]) if 'array' in i[1] else i[0] for i in df_grouped.dtypes ] dfForExplode = df_grouped.select(*selectionColumns)
arrayColumns = [ i[0] for i in dfForExplode.dtypes if 'array' in i[1] ]
for col in arrayColumns: df_grouped=df_grouped.withColumn(col,F.concat_ws(' || ',df_grouped[col]))