Home > OS >  How Update RDD In Spark with FILTER
How Update RDD In Spark with FILTER

Time:07-06

I have a RDD file that have two columns O and D. There is a edges between each values of the columns. For example,

O D
a b
b g
c t
g a

That mean a related to b... And I need to have file like this but with filter all nodes that do not appear to column O. Here we will do the same without the row c -- t because t not appear in column O. I try something that seem to work. I do list with all the column O and filter all value of D that not appear in this list

list_O = df.select('O').rdd.flatMap(lambda x: x).collect()
df1 = df.filter(df.D.isin(list_O)).show()

And when I want to see the head of this new rdd it is error

df1.head(5)

error I don't understand why.

Any Ideas?

CodePudding user response:

Yes I have an idea. The function .show() returns None. Remove the .show() (it is only supposed to print things). df1 is set to None in your code.

list_O = df.select('O').rdd.flatMap(lambda x: x).collect()
df1 = df.filter(df.D.isin(list_O))
  • Related