Home > Net >  Extract specific value from many array and then put them as an array spark sql
Extract specific value from many array and then put them as an array spark sql

Time:08-05

I want to extract some certain values in many arrays and then put them in the other array

1

CodePudding user response:

If it's a homework task, good luck explaining this to your professor ;)

from pyspark.sql import functions as F
df = spark.createDataFrame(
    [("a,b,c,d;e,f,g,h;,,,,",),
     ("a,b,c,d;e,f,g,h;k,l,m,n;,,,,",)],
    ["col_1"])

df = df.withColumn("col_2", F.expr(
    """
    aggregate(
        regexp_extract_all(col_1, '(\\\\w);', 1),
        '',
        (acc, x) -> concat(acc, rpad(x, 2, ','))
    )
    """
))
df.show(truncate=0)
#  ---------------------------- ------ 
# |col_1                       |col_2 |
#  ---------------------------- ------ 
# |a,b,c,d;e,f,g,h;,,,,        |d,h,  |
# |a,b,c,d;e,f,g,h;k,l,m,n;,,,,|d,h,n,|
#  ---------------------------- ------ 

CodePudding user response:

A way which is probably easier to explain to your professor;

Input dataset:

 ----------------------- 
|col_1                  |
 ----------------------- 
|a,b,c,d;e,f,g,h;,,,    |
|a,b,c,d;e,f,g,h;k,l,m,n|
 ----------------------- 

Transformations:

df1 = df1
  // explode in ; first
  .withColumn("col_2", split(col("col_1"), ";"))
  // map each element to the third element of the exploded array
  .withColumn("col_2", expr("transform(col_2, x -> split(x, ',')[3])"))
  // join the result with commas
  .withColumn("col_2", array_join(col("col_2"), ","))

Final output:

 ----------------------- ----- 
|col_1                  |col_2|
 ----------------------- ----- 
|a,b,c,d;e,f,g,h;,,,    |d,h, |
|a,b,c,d;e,f,g,h;k,l,m,n|d,h,n|
 ----------------------- ----- 
  • Related