Home > Enterprise >  Running unionByName on Spark 2.2.0
Running unionByName on Spark 2.2.0

Time:11-05

I am trying to run a unionByName command to combine two dataframes, but when I run my script, the log shows me that "DataFrame object has no attribute 'unionByName'".

df_new = old.unionByName(old2, allowMissingColumns=True)

I sense that it has to do with my Spark or Python version as union is working perfectly fine. The version is 2.2.0.cloudera1. How can I use a newer version of Spark or even use the unionByName command with my existing version?

I also see this in my log

File "/opt/cloudera/parcels/Anaconda-4.0.0/lib/python2.7/importlib/__init__.py", line 37, in import_module

So I sense that I am using Python 2.7?

Thanks!

CodePudding user response:

unionByName

New in version 2.3.

CodePudding user response:

The difference between this function and union() is that this function resolves columns by name (not by position). So if you can do it without change version create a dataframe by reordering your old2 DataFrame. for example if your DataFrames is as below:

old ("col1","col2","col3")

old2("col3","col1","col2")

use something like below:

old3 = old2.select(col("col1"),col("col2"),col("col3"));

new_old = old.union(old3);
  • Related