Home > Net >  Split a pandas dataframe when rows are blank into multiple datadrames
Split a pandas dataframe when rows are blank into multiple datadrames

Time:11-03

I have the following dataframe:

[['M', 'A', '0', '0.2', '0.2', '0.2'],
 [nan, nan, nan, '0.3', '0.3', '1'],
 [nan, nan, nan, '1.4', '3.2', '32'],
 [nan, nan, nan, nan, nan, nan],
 [nan, nan, nan, nan, nan, nan],
 ['sex', 'test', 'conc', 'sugar', 'flour', 'yeast'],
 ['M', 'A', '3', '1.2', '1.2', '1.2'],
 [nan, nan, nan, '1.3', '1.3', '2'],
 [nan, nan, nan, '2.4', '4.2', '33'],
 [nan, nan, nan, nan, nan, nan],
 ['sex', 'test', 'conc', 'sugar', 'flour', 'yeast'],
 ['M', 'A', '6', '2.2', '2.2', '2.2'],
 [nan, nan, nan, '2.3', '2.3', '3'],
 [nan, nan, nan, '3.4', '5.2', '34']]

I'd like to split it when a row is all nans, into multiple dataframes. I've tried the following code from the link below, and it does as I think I want it to do, but it appears to return a list of the splits. How do I get each one into its individual dataframe, so I'd have multiple dataframes?

SOF

df_list = np.split(df, df[df.isnull().all(1)].index)
for df in df_list:
    print(df, '\n') 

CodePudding user response:

IIUC, you can use:

m = df.isna().all(axis=1)

dfs = [g for k,g in df[~m].groupby(m.cumsum())]

Output:

[     0    1    2    3    4    5
 0    M    A    0  0.2  0.2  0.2
 1  NaN  NaN  NaN  0.3  0.3    1
 2  NaN  NaN  NaN  1.4  3.2   32,
      0     1     2      3      4      5
 5  sex  test  conc  sugar  flour  yeast
 6    M     A     3    1.2    1.2    1.2
 7  NaN   NaN   NaN    1.3    1.3      2
 8  NaN   NaN   NaN    2.4    4.2     33,
       0     1     2      3      4      5
 10  sex  test  conc  sugar  flour  yeast
 11    M     A     6    2.2    2.2    2.2
 12  NaN   NaN   NaN    2.3    2.3      3
 13  NaN   NaN   NaN    3.4    5.2     34]

Getting individual dataframes:

dfs[0]

     0    1    2    3    4    5
0    M    A    0  0.2  0.2  0.2
1  NaN  NaN  NaN  0.3  0.3    1
2  NaN  NaN  NaN  1.4  3.2   32

CodePudding user response:

here is one way about it

dfs=[] # list to hold the DF

# code that you already have. which is to split the DF on null rows
df_list = np.split(df, df[df.isnull().all(1)].index)

# Iterate over the df_list and append to dfs
for idx, data in enumerate(df_list):
    dfs.append(data)

dfs[0]
    0   1   2   3   4   5
0   M   A   0   0.2     0.2     0.2
1   NaN     NaN     NaN     0.3     0.3     1
2   NaN     NaN     NaN     1.4     3.2     32
  • Related