I have a data frame:
col1 col2 col3 col4
row1 'a' 'b' 'c' 'd\ne'
row2 'f\ng' 'h\ni' 'j\nk' 'l\nm'
row3 'n' 'o' 'p' 'q'
row4 'r' 's' 't' 'u'
I want to split the data of the row into 2 separate rows based on a character/when character count of the character is more than 4 in a given row.
In this case split data based on : '\n', and when the count of this character row-wise > 3.
Output :
col1 col2 col3 col4
row1 'a' 'b' 'c' 'd\ne'
row2 'f' 'h' 'j' 'l'
row3 'g' 'i' 'k' 'm'
row4 'n' 'o' 'p' 'q'
row5 'r' 's' 't' 'u'
How can i do this ?
Code to identify row with '\n' characters.
for row in range(df.shape[0]):
if '\n' in df.iloc[row,0]:
if (''.join(df.iloc[row]).count('\n'))>3:
print(row)
This can use a little refinement maybe doing all this in a single line ?
Can i somehow use vstack with splitting data frame based on '\n' ?
CodePudding user response:
Assuming you want to split the row when all values contain a \n
:
# identify \n
m = df.apply(lambda s: s.str.contains('\n'))
# split rows having \n in all columns
df.update(df[m.all(axis=1)].apply(lambda x: x.str.split('\n')))
# create new row(s)
df = df.explode(list(df), ignore_index=True)
# update the index
df.index = 'row' (df.index 1).astype(str)
print(df)
Output:
col1 col2 col3 col4
row1 a b c d\ne
row2 f h j l
row3 g i k m
row4 n o p q
row5 r s t u