Home > Software engineering >  How to create a PySpark dataframe from the output of a dataframe filter?
How to create a PySpark dataframe from the output of a dataframe filter?

Time:03-24

I have to create 2 dataframes from a single dataframe based on a filter function.

#df is an existing dataframe

Condition for the first dataframe

df.filter(df['Date'] == max_date ).display()

Condition for the second dataframe

df.filter(df['Date'] != max_date ).display()

FYI, type of dataframe 'df' is:

# <class 'pyspark.sql.dataframe.DataFrame'>

CodePudding user response:

You can just assign the output to a new df.

new_df = df.filter(df['Date'] != max_date )
new_df2 = df.filter(df['Date'] == max_date )
  • Related