first thank you for your time.
I need to make a cross between two Pandas dataframes with the values that I have in a field. The values come in the following form within a field: [A,B,C,N]
I am trying to apply a SPLIT() function to the dataframe field as follows:
df_test = df_temp["NAME"].str.split(expand=True)
The "Name" field is of type object.
My problem is that for some reason my split() splits the values of my NAME field with Null (NaN) values. I don't understand what I'm doing wrong.
From already thank you very much.
CodePudding user response:
You should provide a value to separate the column with.
df['Name'].str.split(',', expand=True)
CodePudding user response:
Based upon the input -
Input data -
from pyspark.sql.types import *
from pyspark.sql.functions import *
df = spark.createDataFrame([(['A', 'B', 'C', 'D'],), ], schema = ['Name'])
df.show()
------------
| Name|
------------
|[A, B, C, D]|
------------
Required Output -
df.select(explode(col("Name")).alias("exploded_Name")).show()
-------------
|exploded_Name|
-------------
| A|
| B|
| C|
| D|
-------------