I have a dataframe:
df = spark.createDataFrame([
('red apple', 'ripe banana', 0.5),
('late autumn', 'heavy rain', 0.1),
('speak loudly','quiet place', 0.9),
('extremely dangerous','fast running', 0.89)
], ["phrase1", "phrase2", 'common_persent'])
df.show()
Out:
------------------- ------------ --------------
| phrase1| phrase2|common_persent|
------------------- ------------ --------------
| red apple| ripe banana| 0.5|
| late autumn| heavy rain| 0.1|
| speak loudly| quiet place| 0.9|
|extremely dangerous|fast running| 0.89|
------------------- ------------ --------------
And I want to number each phrase, for example red apple - 1.1, ripe banana -1.2. That is, the first row is the first column(1.1) and the first row is the second column (1.2), next: late autumn -2.1, heavy rain - 2.2 etc.
Ideally, it will turn out something like this
------- ------- --------------
|phrase1|phrase2|common_persent|
------- ------- --------------
| 1.1| 1.2| 0.5|
| 2.1| 2.2| 0.1|
| 3.1| 3.2| 0.9|
| 4.1| 4.2| 0.89|
CodePudding user response:
Try the following.
df = df.withColumn('rn', F.expr('row_number() over (order by null)'))\
.select(F.expr('rn 0.1').alias('phrase1'), F.expr('rn 0.2').alias('phrase2'), 'common_persent')
df.show()