I am currently trying to filter my dataframe into an if
and get the field returned into variable.
Here is my code:
if df_table.filter(col(field).contains("val")):
id_2 = df_table.select(another_field)
print(id_2)
# Recursive call with new variable
The problem is : it looks like the if filtering works, but id_2 gives me the column name and type where I want the value itself from that field. The output for this code is:
DataFrame[ID_1: bigint]
DataFrame[ID_2: bigint]
...
If I try collect like this : id_2 = df_table.select(another_field).collect()
I get this : [Row(ID_1=3013848), Row(ID_1=319481), Row(ID_1=391948)...]
which looks like just listing all id in a list.
I thought of doing : id_2 = df_table.select(another_field).filter(col(field).contains("val"))
but I still get the same result as first attempt.
I would like my id_2
for each iteration of my loop to take value from the field I am filtering on. Like :
3013848
319481
...
But not a list from every value of matching fields from my dataframe.
Any idea on how I could get that into my variable ?
Thank you for helping.
CodePudding user response:
In fact, dataFrame.select(colName) is supposed to return a column(a dataframe of with only one column) but not the column value of the line. I see in your comment you want to do recursive lookup in a spark dataframe. The thing is, firstly, spark AFAIK, doesn't support recursive operation. If you have a deep recursive operation to do, you'd better collect the dataframe you have and do it on your driver without spark. Instead, you can use what library you want but you lose the advantage of treating the data in the distributive way. Secondly, spark isn't designed to do operations with iteration on each record. Try to achieve with join of dataframes, but it return to my first point, if your later operation of join depends on your join result, in a recursive way, just forget spark.