I got an below error while I am trying to concatenate two pandas dataframes:
TypeError: cannot concatenate object of type 'list; only ps.Series and ps.DataFrame are valid
At the beginning I thought It was emerged because of one the dataframe that includes list on some column. So I tried to concatenate the two dataframes that does not include list on their columns. But I got the same error. I printed the type of dataframes to be sure. Both of them are pandas.core.frame.DataFrame. Why I got this error even they are not list?
import pyspark.pandas as ps
split_col = split_col.toPandas()
split_col2 = split_col2.toPandas()
dfNew = ps.concat([split_col,split_col2],axis=1,ignore_index=True)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/tmp/ipykernel_1455538/463168233.py in <module>
2 split_col = split_col.toPandas()
3 split_col2 = split_col2.toPandas()
----> 4 dfNew = ps.concat([split_col,split_col2],axis=1,ignore_index=True)
/home/anaconda3/envs/virtenv/lib/python3.10/site-packages/pyspark/pandas/namespace.py in concat(objs, axis, join, ignore_index, sort)
2464 for obj in objs:
2465 if not isinstance(obj, (Series, DataFrame)):
-> 2466 raise TypeError(
2467 "cannot concatenate object of type "
2468 "'{name}"
TypeError: cannot concatenate object of type 'list; only ps.Series and ps.DataFrame are valid
type(split_col)
pandas.core.frame.DataFrame
type(split_col2)
pandas.core.frame.DataFrame
I want to concatenate 2 dataframe but I stuck. Do you have any suggestion?
CodePudding user response:
You're having this error because you're trying to concatenate two pandas DataFrames using the Pandas API for pyspark.
Instead of converting your pyspark dataframes to pandas dataframes using the toPandas() method, try the following:
split_col = split_col.to_pandas_on_spark()
split_col2 = split_col2.to_pandas_on_spark()
More documentation on this method.