Home > Mobile >  Split a datrafame pandas based on raw values interval
Split a datrafame pandas based on raw values interval

Time:10-14

I have a little problem that I don't get solutions. I have this dataset as an example: Columms=[A,B,C]

A,B,C
F,Relax,begin
F,,
F,,
H,,
H,,
H,,
G,,
H,,
I,,
G,,
H,Relax,end
O,Cook,begin
Q,,
P,,
I,,
O,,
R,,
P,,
O,Cook,end
H,Relax,begin
F,,
G,,
I,,
I,,
I,,
I,,
I,,
I,,
I,Relax,end

I want to split this dataframe according to different intervals in many dataframes. For example, expected final dataframes:

dataframe 1
A,B,C
F,Relax,begin
F,,
F,,
H,,
H,,
H,,
G,,
H,,
I,,
G,,
H,Relax,end

dataframe 2
A,B,C
O,Cook,begin
Q,,
P,,
I,,
O,,
R,,
P,,
O,Cook,end

dataframe 3
A,B,C 
H,Relax,begin
F,,
G,,
I,,
I,,
I,,
I,,
I,,
I,,
I,Relax,end

Does everyone know how to solve his problem? Regards.

CodePudding user response:

You can group by the cumsum of the positions where "begin" is found and store your groups in a dictionary:

dfs = {'dataframe_%s' % g: d for g,d in df.groupby(df['C'].eq('begin').cumsum())}

The access your sub-dataframes this way:

dfs['dataframe_1']

output:

    A      B      C
0   F  Relax  begin
1   F    NaN    NaN
2   F    NaN    NaN
3   H    NaN    NaN
4   H    NaN    NaN
5   H    NaN    NaN
6   G    NaN    NaN
7   H    NaN    NaN
8   I    NaN    NaN
9   G    NaN    NaN
10  H  Relax    end

NB. you can craft the identifier you want as key in the dictionary. I would personally probably use the plain integer (1/2/3)

CodePudding user response:

This will give you the list of index end points for your desired groups:

slices=np.array(list(df.index))[df.C.isin(['Begin', 'End'])]

slices:

array([ 0, 10, 11, 18, 19, 28])

This allows you to slice your df with df.loc():

#Assign
df1=df.loc[range(slices[0],slices[1] 1),]

#or display with

for x in range(0,len(slices))[::2]:
    print(df.loc[range(slices[x],slices[x 1] 1),])
  • Related