Home > database >  Splitting a dataframe into multiple dataframe based on entries in a column
Splitting a dataframe into multiple dataframe based on entries in a column

Time:06-24

I have such a dataframe:

time | text
01.01.2000 | None
None | abc
None | cde
None | def
01.02.2000 | None
None | abb
None | bbc
None | dde
01.03.2000 | None
None | 123
None | 278
None | 782

I now want to split this dataframe in multiple dataframes beginning with the value where time is not None and adding the rows for each dataframe just one after another with a new line after each original row. That means it should look like this:

df1
time | text
01.01.2000 | abc \n cde \n def

And the second dataframe like this:

df2
time | text
01.02.2000 | abb \n bbc \n dde

How can I do this? I would like to use a for loop to do this.

CodePudding user response:

You can forward fill time column then groupby time column

df['time'] = df['time'].ffill()
out = (df.groupby('time', as_index=False)
       ['text'].agg(lambda x: '\n'.join(x.dropna())))
print(out)

         time           text
0  01.01.2000  abc\ncde\ndef
1  01.02.2000  abb\nbbc\ndde
2  01.03.2000  123\n278\n782
groups = [g for name, g in out.groupby('time')]
print(groups)

[         time           text
0  01.01.2000  abc\ncde\ndef,          time           text
1  01.02.2000  abb\nbbc\ndde,          time           text
2  01.03.2000  123\n278\n782]
  • Related