Home > Software engineering >  How to split string in numpy.ndarray?
How to split string in numpy.ndarray?

Time:01-08

I have a lot of text in a numpy.ndarray that looks like this:

['This is example sentence 1.|This is example sentence 2.'
 'This is example sentence 3.'
 'This is example sentence 4.'
 'This is example sentence 5.'
 'This is example sentence 6.|This is example sentence 7.|This is example sentence 8.|This is example sentence 9.|This is example sentence 10.']

The array can have a large and varying number of elements and individual elements can have many sentences separated with "|".

How do I convert the example above into this:

 ['This is example sentence 1.'
 'This is example sentence 2.'
 'This is example sentence 3.'
 'This is example sentence 4.'
 'This is example sentence 5.'
 'This is example sentence 6.'
 'This is example sentence 7.'
 'This is example sentence 8.'
 'This is example sentence 9.'
 'This is example sentence 10.']

Basically, I'm trying to create a 1-dimensional array that will split elements with "|" into their own separate elements. I've tried many versions of split and can't get them to work for one reason or another.

Thanks!

CodePudding user response:

You can try np.char.split:

# np.concatenate or np.hstack
>>> np.concatenate(np.char.split(arr.astype(str), sep='|'))

array(['This is example sentence 1.', 'This is example sentence 2.',
       'This is example sentence 3.', 'This is example sentence 4.',
       'This is example sentence 5.', 'This is example sentence 6.',
       'This is example sentence 7.', 'This is example sentence 8.',
       'This is example sentence 9.', 'This is example sentence 10.'],
      dtype='<U28')
  • Related