Home > Mobile >  Pandas apply function alternatives for pyspark dataframe (want to convert integer data type column t
Pandas apply function alternatives for pyspark dataframe (want to convert integer data type column t

Time:12-02

Want to convert integer datatype column to list datatype

Given DataFrame

   a  b
0  9  2
1  9  3

Want to convert to

   a    b
0  9  [2]
1  9  [3]

Pandas solution

import pandas as pd
df = pd.DataFrame({"a":[1,2],"b":[3,4]})
df["b"] = df["b"].apply(lambda row: [row])

How can I achieve the same in pyspark ?

I tried a naive way

from pyspark.sql.types import IntegerType, ArrayType
from pyspark.sql.functions import col
df_sp = spark.createDataFrame(df)
#EDIT according to 过过招 Answer 
df_sp = df_sp.withColumn("b",col("b").cast(ArrayType(IntegerType())))
display(df_sp)

Which gives an error AnalysisException: cannot resolve 'b' due to data type mismatch: cannot cast bigint to array<int>;

CodePudding user response:

You need to specify the data type of element in the array.

df_sp = df_sp.withColumn("b", col("b").cast(ArrayType(IntegerType())))
df_sp.show()
  • Related