Code:
pdf=[(1,'a',4,'a',4.1,'d'),(2,'b',3,'b',3.2,'c'),(3,'c',2,'c',2.3,'b'),(1,'d',1,'d',1.4,'a')]
df15 = spark.createDataFrame(pdf, ('x','y','z','a','b','a') )
df15.show(2)
try: df15.select(df15.a).show(2)
except: print("failed")
df15.columns
try: df15.select(df15.columns[3]).show(2)
except: print("failed")
df15.withColumnRenamed('a', 'b_id').show(2)
df15.drop('a').show(2)
Output:
--- --- --- --- --- ---
| x| y| z| a| b| a|
--- --- --- --- --- ---
| 1| a| 4| a|4.1| d|
| 2| b| 3| b|3.2| c|
--- --- --- --- --- ---
only showing top 2 rows
failed
failed
--- --- --- ---- --- ----
| x| y| z|b_id| b|b_id|
--- --- --- ---- --- ----
| 1| a| 4| a|4.1| d|
| 2| b| 3| b|3.2| c|
--- --- --- ---- --- ----
only showing top 2 rows
--- --- --- ---
| x| y| z| b|
--- --- --- ---
| 1| a| 4|4.1|
| 2| b| 3|3.2|
--- --- --- ---
only showing top 2 rows
How to rename a duplicate column or perform select operations on it?
- select operation doesn't work on duplicate col names
- rename and drop operation applies changes to both duplicate col names
CodePudding user response:
you could define a list of new column names and rename all columns for the dataframe at once, then drop whatever column you want to drop
new_cols = ['x','y','z','b_id','b','b_id_to_drop']
df = df.toDF(*new_cols)
df = df.drop('b_id_to_drop')