Home > Software engineering >  How to convert multiple array type columns in pyspark dataframe to string?
How to convert multiple array type columns in pyspark dataframe to string?

Time:06-23

I would like to convert multiple array time columns in a dataframe to string. Can someone please help?

Dataframe is like below

I have dataframewith different types of element.Some number/some array. I want to convert only array columns to string and the rest should be as it is.

Expected Output: Expected Output:

CodePudding user response:

You can use array_join transformation in pyspark. https://spark.apache.org/docs/3.1.3/api/python/reference/api/pyspark.sql.functions.array_join.html#pyspark.sql.functions.array_join

CodePudding user response:

You should use the .apply(method) attribute to your DataFrame.

For exemple, if you want to combine the letters in the arrays in your column1:

df["result"] = df["column1"].apply(lambda x: "".join(x))

But you can do whatever you want with the method as long as you can make a function that take a cell as an input and output, your desired result.

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.apply.html

CodePudding user response:

I used something like this and that gave me the results: selectionColumns = [F.coalesce(i[0], F.array()).alias(i[0]) if 'array' in i[1] else i[0] for i in df_grouped.dtypes ] dfForExplode = df_grouped.select(*selectionColumns)

arrayColumns = [ i[0] for i in dfForExplode.dtypes if 'array' in i[1] ]

for col in arrayColumns: df_grouped=df_grouped.withColumn(col,F.concat_ws(' || ',df_grouped[col]))

  • Related