Home > Software engineering >  How to split data into 2 rows based on a character?
How to split data into 2 rows based on a character?

Time:12-14

I have a data frame:

      col1  col2   col3  col4
row1  'a'    'b'    'c'   'd\ne'
row2 'f\ng' 'h\ni' 'j\nk' 'l\nm'
row3 'n'    'o'    'p'     'q'
row4 'r'    's'    't'     'u'

I want to split the data of the row into 2 separate rows based on a character/when character count of the character is more than 4 in a given row.

In this case split data based on : '\n', and when the count of this character row-wise > 3.

Output :

          col1  col2   col3  col4
    row1  'a'     'b'    'c'   'd\ne'
    row2  'f'     'h'    'j'    'l'
    row3  'g'     'i'    'k'    'm'
    row4  'n'     'o'    'p'    'q'
    row5  'r'     's'    't'    'u'

How can i do this ?

Code to identify row with '\n' characters.

for row in range(df.shape[0]):
    if '\n' in df.iloc[row,0]:
        if (''.join(df.iloc[row]).count('\n'))>3:
            print(row)

This can use a little refinement maybe doing all this in a single line ?

Can i somehow use vstack with splitting data frame based on '\n' ?

CodePudding user response:

Assuming you want to split the row when all values contain a \n:

# identify \n
m = df.apply(lambda s: s.str.contains('\n'))

# split rows having \n in all columns
df.update(df[m.all(axis=1)].apply(lambda x: x.str.split('\n')))

# create new row(s)
df = df.explode(list(df), ignore_index=True)

# update the index
df.index = 'row'   (df.index 1).astype(str)

print(df)

Output:

     col1 col2 col3  col4
row1    a    b    c  d\ne
row2    f    h    j     l
row3    g    i    k     m
row4    n    o    p     q
row5    r    s    t     u
  • Related