Home > Back-end >  Filter key-value rdd when the value is a list
Filter key-value rdd when the value is a list

Time:12-02

I have tuples like this:

('id1', ['date', 'type', 'value', '2017-11-11 08:32:46.934', 'no_error', '54.64325', '2017-11-11 08:32:47.356', 'no_error', '76.34553']

I want to retrieve only the elements that are floats. I have only found solutions for this if the value is just one element not a list, using something along the lines of this:

filter(lambda t: is_float(t[1]) == True)

being is_float a function I created that, as the name says, returns true if the value is a float. How could I solve it?

CodePudding user response:

That's what isinstance() is for. It will return True if the first parameter is an instance of the second one.

>>> isinstance(1, float)
False
>>> isinstance("1.0", float)
False
>>> isinstance(1.0, float)
True

CodePudding user response:

You could achieve it with list comprehension with an if-clause:

def is_float(s):
    try:
        float(s)
        return True
    except ValueError:
        return False

rdd.map(lambda key, list_value: (key, [element for element in list_value if is_float(element)]))

This will not be very performant, though.

Update: I changed the code to incorporate the OP's remark, that the list elements are strings.

  • Related