I have to create 2 dataframes from a single dataframe based on a filter function.
#df is an existing dataframe
Condition for the first dataframe
df.filter(df['Date'] == max_date ).display()
Condition for the second dataframe
df.filter(df['Date'] != max_date ).display()
FYI, type of dataframe 'df' is:
# <class 'pyspark.sql.dataframe.DataFrame'>
CodePudding user response:
You can just assign the output to a new df.
new_df = df.filter(df['Date'] != max_date )
new_df2 = df.filter(df['Date'] == max_date )