I have a little problem that I don't get solutions. I have this dataset as an example: Columms=[A,B,C]
A,B,C
F,Relax,begin
F,,
F,,
H,,
H,,
H,,
G,,
H,,
I,,
G,,
H,Relax,end
O,Cook,begin
Q,,
P,,
I,,
O,,
R,,
P,,
O,Cook,end
H,Relax,begin
F,,
G,,
I,,
I,,
I,,
I,,
I,,
I,,
I,Relax,end
I want to split this dataframe according to different intervals in many dataframes. For example, expected final dataframes:
dataframe 1
A,B,C
F,Relax,begin
F,,
F,,
H,,
H,,
H,,
G,,
H,,
I,,
G,,
H,Relax,end
dataframe 2
A,B,C
O,Cook,begin
Q,,
P,,
I,,
O,,
R,,
P,,
O,Cook,end
dataframe 3
A,B,C
H,Relax,begin
F,,
G,,
I,,
I,,
I,,
I,,
I,,
I,,
I,Relax,end
Does everyone know how to solve his problem? Regards.
CodePudding user response:
You can group by the cumsum
of the positions where "begin" is found and store your groups in a dictionary:
dfs = {'dataframe_%s' % g: d for g,d in df.groupby(df['C'].eq('begin').cumsum())}
The access your sub-dataframes this way:
dfs['dataframe_1']
output:
A B C
0 F Relax begin
1 F NaN NaN
2 F NaN NaN
3 H NaN NaN
4 H NaN NaN
5 H NaN NaN
6 G NaN NaN
7 H NaN NaN
8 I NaN NaN
9 G NaN NaN
10 H Relax end
NB. you can craft the identifier you want as key in the dictionary. I would personally probably use the plain integer (1/2/3)
CodePudding user response:
This will give you the list of index end points for your desired groups:
slices=np.array(list(df.index))[df.C.isin(['Begin', 'End'])]
slices:
array([ 0, 10, 11, 18, 19, 28])
This allows you to slice your df with df.loc()
:
#Assign
df1=df.loc[range(slices[0],slices[1] 1),]
#or display with
for x in range(0,len(slices))[::2]:
print(df.loc[range(slices[x],slices[x 1] 1),])