I have the following dataframe:
[['M', 'A', '0', '0.2', '0.2', '0.2'],
[nan, nan, nan, '0.3', '0.3', '1'],
[nan, nan, nan, '1.4', '3.2', '32'],
[nan, nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan, nan],
['sex', 'test', 'conc', 'sugar', 'flour', 'yeast'],
['M', 'A', '3', '1.2', '1.2', '1.2'],
[nan, nan, nan, '1.3', '1.3', '2'],
[nan, nan, nan, '2.4', '4.2', '33'],
[nan, nan, nan, nan, nan, nan],
['sex', 'test', 'conc', 'sugar', 'flour', 'yeast'],
['M', 'A', '6', '2.2', '2.2', '2.2'],
[nan, nan, nan, '2.3', '2.3', '3'],
[nan, nan, nan, '3.4', '5.2', '34']]
I'd like to split it when a row is all nans, into multiple dataframes. I've tried the following code from the link below, and it does as I think I want it to do, but it appears to return a list of the splits. How do I get each one into its individual dataframe, so I'd have multiple dataframes?
df_list = np.split(df, df[df.isnull().all(1)].index)
for df in df_list:
print(df, '\n')
CodePudding user response:
IIUC, you can use:
m = df.isna().all(axis=1)
dfs = [g for k,g in df[~m].groupby(m.cumsum())]
Output:
[ 0 1 2 3 4 5
0 M A 0 0.2 0.2 0.2
1 NaN NaN NaN 0.3 0.3 1
2 NaN NaN NaN 1.4 3.2 32,
0 1 2 3 4 5
5 sex test conc sugar flour yeast
6 M A 3 1.2 1.2 1.2
7 NaN NaN NaN 1.3 1.3 2
8 NaN NaN NaN 2.4 4.2 33,
0 1 2 3 4 5
10 sex test conc sugar flour yeast
11 M A 6 2.2 2.2 2.2
12 NaN NaN NaN 2.3 2.3 3
13 NaN NaN NaN 3.4 5.2 34]
Getting individual dataframes:
dfs[0]
0 1 2 3 4 5
0 M A 0 0.2 0.2 0.2
1 NaN NaN NaN 0.3 0.3 1
2 NaN NaN NaN 1.4 3.2 32
CodePudding user response:
here is one way about it
dfs=[] # list to hold the DF
# code that you already have. which is to split the DF on null rows
df_list = np.split(df, df[df.isnull().all(1)].index)
# Iterate over the df_list and append to dfs
for idx, data in enumerate(df_list):
dfs.append(data)
dfs[0]
0 1 2 3 4 5
0 M A 0 0.2 0.2 0.2
1 NaN NaN NaN 0.3 0.3 1
2 NaN NaN NaN 1.4 3.2 32