Home > Blockchain >  Pyspark : How to transform dataframe without the caracter
Pyspark : How to transform dataframe without the caracter

Time:02-28

Matricule (type array)
[TKI1]
[TKI4]

I will obtain this dataframe

Matricule (type string)
TKI1
TKI4

CodePudding user response:

Since your Marticule is of ArrayType in the beginning. You can directly utilise getItem as below -

Data Preparation

df = pd.DataFrame({
        'Matricule':[['TKI1'],['TKI4']],
})

sparkDF = sql.createDataFrame(df)

sparkDF.show()

 --------- 
|Matricule|
 --------- 
|   [TKI1]|
|   [TKI4]|
 --------- 

sparkDF.printSchema()

root
 |-- Matricule: array (nullable = true)
 |    |-- element: string (containsNull = true)

Get Item

sparkDF = sparkDF.withColumn('Matricule_string',F.col('Matricule').getItem(0))

sparkDF.show()

 --------- ---------------- 
|Matricule|Matricule_string|
 --------- ---------------- 
|   [TKI1]|            TKI1|
|   [TKI4]|            TKI4|
 --------- ---------------- 
  • Related