Home > Software engineering >  How to convert two columns of dataframe into an orderedDict in Python?
How to convert two columns of dataframe into an orderedDict in Python?

Time:01-17

I have a table named tableTest like this:

startDate endDate
2022-12-15 2022-12-18
2022-12-19 2022-12-21
2022-12-22 2022-12-24
2022-12-26 2022-12-27
2022-12-29 2022-12-30
2022-12-02 2022-12-04
2022-12-06 2022-12-07
2022-12-07 2022-12-08
2022-12-09 2022-12-09
2022-12-13 2022-12-14

I need to loop the key-value pairs consisting of startDate and endDate by original order.

What I did:

import pandas as pd

data = [
    ("2022-12-15", "2022-12-18"),
    ("2022-12-19", "2022-12-21"),
    ("2022-12-22", "2022-12-24"),
    ("2022-12-26", "2022-12-27"),
    ("2022-12-29", "2022-12-30"),
    ("2022-12-02", "2022-12-04"),
    ("2022-12-06", "2022-12-07"),
    ("2022-12-07", "2022-12-08"),
    ("2022-12-13", "2022-12-14"),
    ("2023-01-01", "2023-01-03"),
]

df = spark.createDataFrame(data).toDF(*('startDate', 'endDate')).toPandas()
dictTest = df.set_index('startDate')['endDate'].to_dict()

print(dictTest)

for k,v in dictTest.items():
    print(f'startDate is {k} and corresponding endDate is {v}.')

The above code can indeed convert these two columns to dict, but dict is unordered, so I lost the original order of these two columns.

Thank you in advance.

CodePudding user response:

You can use the into parameter of .to_dict to pass in an OrderedDict:

from collections import OrderedDict 
dictTest = df.set_index('startDate')['endDate'].to_dict(into=OrderedDict)

See the docs here.

CodePudding user response:

You can just use iterrows to iterate in the original order as long as tableTest is a dataframe.

for index, row in tableTest.iterrows():
    startDate = row['startDate']
    endDate = row['endDate']
    print(f'startDate is {startDate} and corresponding endDate is {endDate}.')
  • Related