Home > Mobile >  How two create label column, based on index number (odd/even) on pySpark
How two create label column, based on index number (odd/even) on pySpark

Time:12-12

Here's my Input

    index   date_id     year    month   day hour    minute
0   156454  20200801    2021    12       31    12       38
1   156454  20200801    2021    12       31    12       39

What I want is just make label 'poi1' for odd rows and 'poi2' for even rows

Here's my output

    index   date_id     year    month   day hour    minute  label
0   156454  20200801    2021    12       31    12       38  poi1
1   156454  20200801    2021    12       31    12       39  poi2

The pandas code is like this

df_movmnt_2["label"] = np.where(((df_movmnt_2.index) 1)%2 != 0, "poi1", "poi2")

CodePudding user response:

Use when().otherwise()

   df.withColumn('label', when((col('index') 1)%2==0,'poi1').otherwise('poi2')).show()

 ----- ------- -------- ----- --- ---- ------ --- ----- 
|index|date_id|    year|month|day|hour|minute| _8|label|
 ----- ------- -------- ----- --- ---- ------ --- ----- 
|    0| 156454|20200801| 2021| 12|  31|    12| 38| poi2|
|    1| 156454|20200801| 2021| 12|  31|    12| 39| poi1|
 ----- ------- -------- ----- --- ---- ------ --- ----- 
  • Related