I have a dataframe like this:
ID | A | B |
---|---|---|
case1 | % | case description1 |
case2 | ab | case description2 |
case3 | gh | case description3 |
case4 | sg | case description4 |
I want to convert this df to a dictionary, so I want it to look like this:
{case1:['%','casedescription1'],case2:['ab','case description2'],case3:['gh','case description3], case4:['sg','case description4']}
I want to later use this dict to update values into a table in our sqldb using this function:
def update_units(source_dictionary,tag_list):
for ID in id_list:
key=ID
value1=(source_dictionary[ID][1] if ID in source_dictionary else None)
value2=(source_dictionary[ID][2] if ID in source_dictionary else None)
session.query(table1).filter(table1.Id == key).update(
{
"A": value1,
"B": value2
}
)
session.commit()
CodePudding user response:
You could first create an array
from columns A and B. Then, collect
into a dictionary comprehension.
from pyspark.sql import functions as F
df = spark.createDataFrame(
[('case1', '%', 'case description1'),
('case2', 'ab', 'case description2'),
('case3', 'gh', 'case description3'),
('case4', 'sg', 'case description4')],
['ID', 'A', 'B'])
df = df.select('ID', F.array('A', 'B'))
dic = {k: v for k, v in df.collect()}
print(dic)
# {'case1': ['%', 'case description1'], 'case2': ['ab', 'case description2'], 'case3': ['gh', 'case description3'], 'case4': ['sg', 'case description4']}