Home > Software engineering >  Is it possible to split a value by 2 delimiters in a rdd using pyspark?
Is it possible to split a value by 2 delimiters in a rdd using pyspark?

Time:12-01

I have tuples like this:

('id1', 'date;type;value\n2017-11-11 08:32:46.934;no_error;54.64325\n2017-11-11 08:32:47.356;no:error;35.46643\n')

And I want to split the value by ';' and '\n'. But I found I can't concatenate them and I don't know what to do instead. So far, I've got this:

rdd.mapValues(lambda t:  t.split(';'))

Is there any way I can split it by 2 delimiters?

CodePudding user response:

You can use re.split splitting on regex ;|\n:

import re
rdd.mapValues(lambda t:  re.split(';|\n', t))
  • Related