I have a table named tableTest
like this:
startDate | endDate |
---|---|
2022-12-15 | 2022-12-18 |
2022-12-19 | 2022-12-21 |
2022-12-22 | 2022-12-24 |
2022-12-26 | 2022-12-27 |
2022-12-29 | 2022-12-30 |
2022-12-02 | 2022-12-04 |
2022-12-06 | 2022-12-07 |
2022-12-07 | 2022-12-08 |
2022-12-09 | 2022-12-09 |
2022-12-13 | 2022-12-14 |
I need to loop the key-value pairs consisting of startDate and endDate by original order.
What I did:
import pandas as pd
data = [
("2022-12-15", "2022-12-18"),
("2022-12-19", "2022-12-21"),
("2022-12-22", "2022-12-24"),
("2022-12-26", "2022-12-27"),
("2022-12-29", "2022-12-30"),
("2022-12-02", "2022-12-04"),
("2022-12-06", "2022-12-07"),
("2022-12-07", "2022-12-08"),
("2022-12-13", "2022-12-14"),
("2023-01-01", "2023-01-03"),
]
df = spark.createDataFrame(data).toDF(*('startDate', 'endDate')).toPandas()
dictTest = df.set_index('startDate')['endDate'].to_dict()
print(dictTest)
for k,v in dictTest.items():
print(f'startDate is {k} and corresponding endDate is {v}.')
The above code can indeed convert these two columns to dict, but dict is unordered, so I lost the original order of these two columns.
Thank you in advance.
CodePudding user response:
You can use the into parameter of .to_dict
to pass in an OrderedDict
:
from collections import OrderedDict
dictTest = df.set_index('startDate')['endDate'].to_dict(into=OrderedDict)
CodePudding user response:
You can just use iterrows to iterate in the original order as long as tableTest is a dataframe.
for index, row in tableTest.iterrows():
startDate = row['startDate']
endDate = row['endDate']
print(f'startDate is {startDate} and corresponding endDate is {endDate}.')