I have a dataframe in Spark that looks like this (but with more rows), where each city has the number of visitors on my website.
| date | New York | Los Angeles | Tokyo | London | Berlin | Paris |
|:----------- |:--------:| -----------:|------:|-------:|-------:|------:|
| 2022-01-01 | 150000 | 1589200 | 500120| 120330 |95058331|980000 |
I wanted to order the columns based onn this list of cities (they are ordered according to their importance to me)
order = ["Paris", "Berlin", "London", "New York", "Los Angeles", "Tokyo"]
In the end, I need a dataframe like this. Is there any way to create a function that perform this ordering everytime I need it? Expected result bellow:
| date | Paris | Berlin | London | New York | Los Angeles | Tokyo |
|:----------- |:--------:| -------:|-------:|---------:|------------:|------:|
| 2022-01-01 | 980000 | 95058331| 120330 | 150000 | 1589200 | 500120|
Thank you!
CodePudding user response:
you order it from the start launching point of the discard launch
CodePudding user response:
Your exemple:
df_exemple = spark.createDataFrame(
[
('2022-01-01','150000 ','1589200','500120','120330','95058331','980000')
], ['date', 'New York', 'Los Angeles', 'Tokyo', 'London', 'Berlin', 'Paris'])
order = ['Paris', 'Berlin', 'London', 'New York', 'Los Angeles', 'Tokyo']
Now, a simple function to reorder:
def order_func(df, order_list):
return df.select('date', *order_list)
result_df = order_func(df_exemple, order)
result_df.show()
---------- ------ -------- ------ -------- ----------- ------
| date| Paris| Berlin|London|New York|Los Angeles| Tokyo|
---------- ------ -------- ------ -------- ----------- ------
|2022-01-01|980000|95058331|120330| 150000 | 1589200|500120|
---------- ------ -------- ------ -------- ----------- ------