I am trying to run a unionByName command to combine two dataframes, but when I run my script, the log shows me that "DataFrame object has no attribute 'unionByName'".
df_new = old.unionByName(old2, allowMissingColumns=True)
I sense that it has to do with my Spark or Python version as union is working perfectly fine. The version is 2.2.0.cloudera1. How can I use a newer version of Spark or even use the unionByName command with my existing version?
I also see this in my log
File "/opt/cloudera/parcels/Anaconda-4.0.0/lib/python2.7/importlib/__init__.py", line 37, in import_module
So I sense that I am using Python 2.7?
Thanks!
CodePudding user response:
New in version 2.3.
CodePudding user response:
The difference between this function and union() is that this function resolves columns by name (not by position). So if you can do it without change version create a dataframe by reordering your old2
DataFrame.
for example if your DataFrames is as below:
old ("col1","col2","col3")
old2("col3","col1","col2")
use something like below:
old3 = old2.select(col("col1"),col("col2"),col("col3"));
new_old = old.union(old3);