Convert dataframe to dictionary with list as values-CodePudding

I have a dataframe like this:

ID	A	B
case1	%	case description1
case2	ab	case description2
case3	gh	case description3
case4	sg	case description4

I want to convert this df to a dictionary, so I want it to look like this:

{case1:['%','casedescription1'],case2:['ab','case description2'],case3:['gh','case description3], case4:['sg','case description4']}

I want to later use this dict to update values into a table in our sqldb using this function:

def update_units(source_dictionary,tag_list):
    for ID in id_list:
      key=ID
      value1=(source_dictionary[ID][1] if ID in source_dictionary else None)
      value2=(source_dictionary[ID][2] if ID in source_dictionary else None)
      
      session.query(table1).filter(table1.Id == key).update(
                {
                    "A": value1,
                    "B": value2
                }
            )
    session.commit()

CodePudding user response：

You could first create an array from columns A and B. Then, collect into a dictionary comprehension.

from pyspark.sql import functions as F
df = spark.createDataFrame(
    [('case1', '%', 'case description1'),
     ('case2', 'ab', 'case description2'),
     ('case3', 'gh', 'case description3'),
     ('case4', 'sg', 'case description4')],
    ['ID', 'A', 'B'])

df = df.select('ID', F.array('A', 'B'))
dic = {k: v for k, v in df.collect()}

print(dic)
# {'case1': ['%', 'case description1'], 'case2': ['ab', 'case description2'], 'case3': ['gh', 'case description3'], 'case4': ['sg', 'case description4']}