I have a dataframe with several columns. What I need to do is match data in a list in column Name
, obtain the index, and use that index to search a list in column Date
and grab the corresponding date to filter on.
Name | Date |
---|---|
[n1, n2, n3] | [date1, date2, date3] |
e.g. if Name === n2
, return index
then: Date[index]
return date
then: filter
on date
The table has thousands of rows with other columns (containing unique IDs and other data) so I'm looking for a solution to work on that scale.
This needs to be in Spark Scala, which I am very new in learning and I'm struggling to even print a value from the initial if statement. Any pointers or potential things to try would be massively appreciated.
I can access the list within the column using
df.select($"name")
and access a specific piece of data at an index using
df.select($"name"(0))
However, if I try:
for (x <- df.select($"name"){
println("print anything")
}
the code looks like it executes (there are no errors) but I get nothing printed at all, not even just a string like I have above.
I've been trying to get even this simple(!) first bit to work for a good few days and have tried a number of solutions found here, but most solutions seem to throw a type error (e.g., I have a row and not a list). I don't understand this as I can pull data by specifying an index, so why is it unable to read the list and thinks it is a row?
CodePudding user response:
I am not sure what you meant by then: filter on date
, but this might help you achieve what you want to do.
If df
was this dataset:
------------ ---------------------
|Name |Date |
------------ ---------------------
|[n1, n2, n3]|[date1, date2, date3]|
------------ ---------------------
Then this:
df.withColumn("newDate", expr("Date[find_in_set('n2', array_join(Name, ',')) - 1]"))
Gives this:
------------ --------------------- -------
|Name |Date |newDate|
------------ --------------------- -------
|[n1, n2, n3]|[date1, date2, date3]|date2 |
------------ --------------------- -------
Which you can then filter
or do what ever you like. Hope it helps, good luck!