Insert data to different columns based on the first value from a text file-CodePudding

This is my text file:

1|16255|usa|Test||TEST806282|||||
2|16255|806282|company_member|False|true
3|16255|my_data
1|18299|usa|Test||TEST260092|||||
2|18299|260092|company_member|False|False

I want to insert this data into different columns in a single table based on the key (1,2,3,1,2) using Spark SQL.

Desired output:

CodePudding user response：

Thank you for the clarifications. Something along the following lines should do the trick. Text files are best read using rdds.

rdd = spark.sparkContext.textFile('my_path/my_file.txt')
rdd = rdd.map(lambda k: k.split('|'))

def relocate(e):
    if e[0] == '1':
        return [e[0], e[1], e[2], e[5], '', '', '', '', '', '']
    elif e[0] == '2':
        return [e[0], e[1], '', '', '', e[2], e[3], e[4], e[5], '']
    elif e[0] == '3':
        return [e[0], e[1], '', '', '', '', '', '', '', e[2]]

rdd = rdd.map(relocate)
df = rdd.toDF(['recordtype', 'idnumber', 'country', 'designation', 'dept', 'empnumber', 'membertype', 'billable', 'active', 'Emplinfo'])

df.show()
#  ---------- -------- ------- ----------- ---- --------- -------------- -------- ------ -------- 
# |recordtype|idnumber|country|designation|dept|empnumber|    membertype|billable|active|Emplinfo|
#  ---------- -------- ------- ----------- ---- --------- -------------- -------- ------ -------- 
# |         1|   16255|    usa| TEST806282|    |         |              |        |      |        |
# |         2|   16255|       |           |    |   806282|company_member|   False|  true|        |
# |         3|   16255|       |           |    |         |              |        |      | my_data|
# |         1|   18299|    usa| TEST260092|    |         |              |        |      |        |
# |         2|   18299|       |           |    |   260092|company_member|   False| False|        |
#  ---------- -------- ------- ----------- ---- --------- -------------- -------- ------ --------