Split a datrafame pandas based on raw values interval-CodePudding

I have a little problem that I don't get solutions. I have this dataset as an example: Columms=[A,B,C]

A,B,C
F,Relax,begin
F,,
F,,
H,,
H,,
H,,
G,,
H,,
I,,
G,,
H,Relax,end
O,Cook,begin
Q,,
P,,
I,,
O,,
R,,
P,,
O,Cook,end
H,Relax,begin
F,,
G,,
I,,
I,,
I,,
I,,
I,,
I,,
I,Relax,end

I want to split this dataframe according to different intervals in many dataframes. For example, expected final dataframes:

dataframe 1
A,B,C
F,Relax,begin
F,,
F,,
H,,
H,,
H,,
G,,
H,,
I,,
G,,
H,Relax,end

dataframe 2
A,B,C
O,Cook,begin
Q,,
P,,
I,,
O,,
R,,
P,,
O,Cook,end

dataframe 3
A,B,C 
H,Relax,begin
F,,
G,,
I,,
I,,
I,,
I,,
I,,
I,,
I,Relax,end

Does everyone know how to solve his problem? Regards.

CodePudding user response：

You can group by the cumsum of the positions where "begin" is found and store your groups in a dictionary:

dfs = {'dataframe_%s' % g: d for g,d in df.groupby(df['C'].eq('begin').cumsum())}

The access your sub-dataframes this way:

dfs['dataframe_1']

output:

    A      B      C
0   F  Relax  begin
1   F    NaN    NaN
2   F    NaN    NaN
3   H    NaN    NaN
4   H    NaN    NaN
5   H    NaN    NaN
6   G    NaN    NaN
7   H    NaN    NaN
8   I    NaN    NaN
9   G    NaN    NaN
10  H  Relax    end

NB. you can craft the identifier you want as key in the dictionary. I would personally probably use the plain integer (1/2/3)

CodePudding user response：

This will give you the list of index end points for your desired groups:

slices=np.array(list(df.index))[df.C.isin(['Begin', 'End'])]

slices:

array([ 0, 10, 11, 18, 19, 28])

This allows you to slice your df with df.loc():

#Assign
df1=df.loc[range(slices[0],slices[1] 1),]

#or display with

for x in range(0,len(slices))[::2]:
    print(df.loc[range(slices[x],slices[x 1] 1),])