I am new to Databricks and python, i just want to know the best way to change the column names in Databricks for example if the column name is 'ID' then want to change that to Patien_ID ,'Name' to 'Patient_Name'. so i taught i will use dictionaries but i dont know how to apply that as col names. Please Help, thanks in advance.
Note: the position of col names can change so taught of using dictionary:
Disctionary = { : <Patient_ID>, : <Patient_Name>, : <Patient_age>}
example of what iam trying to achieve(picture attached)
i tried using a json file to do this but i ended up no wr
CodePudding user response:
Given the following dataset
columns=["ID","Name","Age","Country"]
data = [(1,"John","42","Spain"),(2,"Jane","24","Norway"),(3,"Nohj","38","Iceland"),(4,"Fabrice","65","France")]
df=spark.createDataFrame(data,columns)
df.show()
--- ------- --- -------
| ID| Name|Age|Country|
--- ------- --- -------
| 1| John| 42| Spain|
| 2| Jane| 24| Norway|
| 3| Nohj| 38|Iceland|
| 4|Fabrice| 65| France|
--- ------- --- -------
You could loop on your dictionary as follows :
dictionary = {"ID": "Patient_ID", "Name": "Patient_Name", "Age": "Patient_Age"}
for column in dictionary.keys() :
df = df.withColumnRenamed(column,dictionary[column])
df.show()
---------- ----------- ----------- -------
|Patient_ID|Patient_Name|Patient_Age|Country|
---------- ----------- ----------- -------
| 1| John| 42| Spain|
| 2| Jane| 24| Norway|
| 3| Nohj| 38|Iceland|
| 4| Fabrice| 65| France|
---------- ----------- ----------- -------