Home > database >  How to assign an ordinal number in a data frame column in pyspark by indexes
How to assign an ordinal number in a data frame column in pyspark by indexes

Time:11-19

I have a dataframe:

df = spark.createDataFrame([
        ('red apple', 'ripe banana', 0.5),
        ('late autumn', 'heavy rain', 0.1),
        ('speak loudly','quiet place', 0.9),
        ('extremely dangerous','fast running', 0.89)
    ], ["phrase1", "phrase2", 'common_persent'])
    df.show()

Out:

 ------------------- ------------ -------------- 
|            phrase1|     phrase2|common_persent|
 ------------------- ------------ -------------- 
|          red apple| ripe banana|           0.5|
|        late autumn|  heavy rain|           0.1|
|       speak loudly| quiet place|           0.9|
|extremely dangerous|fast running|          0.89|
 ------------------- ------------ -------------- 

And I want to number each phrase, for example red apple - 1.1, ripe banana -1.2. That is, the first row is the first column(1.1) and the first row is the second column (1.2), next: late autumn -2.1, heavy rain - 2.2 etc.

Ideally, it will turn out something like this

 ------- ------- -------------- 
|phrase1|phrase2|common_persent|
 ------- ------- -------------- 
|    1.1|    1.2|           0.5|
|    2.1|    2.2|           0.1|
|    3.1|    3.2|           0.9|
|    4.1|    4.2|          0.89|

CodePudding user response:

Try the following.

df = df.withColumn('rn', F.expr('row_number() over (order by null)'))\
    .select(F.expr('rn 0.1').alias('phrase1'), F.expr('rn 0.2').alias('phrase2'), 'common_persent')
df.show()
  • Related