I'm trying to split a dataframe from excel into multiple dataframes.
The dataframe looks like this:
Name | Value | Unit | |
---|---|---|---|
0 | AA | 10 | mm |
1 | BDC | 20 | mm |
2 | NaN | NaN | NaN |
3 | AFD | 60 | mm |
4 | AKW | 18 | cm |
5 | TDF | 0,5 | mm |
6 | NaN | NaN | NaN |
7 | AA | 10 | mm |
8 | FB | 65 | l |
I already got the whole dataframe in python stored correctly, however I have no clue how I can split the dataframe into multiple dataframes when there is a NaN row. I tried to iterate over .iterrows() but it only gives me the rows until the first NaN row. And what is the best practise to merge the rows into an new dataframe? Any help is appreciated.
content = pd.read_excel(filepath, sheet_name='Parameter'))
right_tables = content[['Name', 'Value', 'Unit']]
for i, row in right_tables.head().iterrows():
print(row)
Output in console:
Name AA
Value 10
Unit mm
Name: 0, dtype: object
Name BDC
Value 20
Unit mm
Name: 1, dtype: object
Name NaN
Value NaN
Unit NaN
Name: 2, dtype: object
And the result I need should be like: | | Name | Value | Unit | |--|--|--|--| |0|AA|10|mm| |1|BDC|20|mm|
Name | Value | Unit | |
---|---|---|---|
0 | AFD | 60 | mm |
1 | AKW | 18 | cm |
2 | TDF | 0,5 | mm |
Name | Value | Unit | |
---|---|---|---|
0 | AA | 10 | mm |
1 | FB | 65 | l |
CodePudding user response:
Remove missing rows by DataFrame.dropna
and groupby by Series created by Series.isna
with Series.cumsum
:
for g, df in df.dropna().groupby(df['Name'].isna().cumsum()):
print (df.reset_index(drop=True))
Name Value Unit
0 AA 10 mm
1 BDC 20 mm
Name Value Unit
0 AFD 60 mm
1 AKW 18 cm
2 TDF 0,5 mm
Name Value Unit
0 AA 10 mm
1 FB 65 l
If need list of DataFrames:
dfs= [df.reset_index(drop=True) for g,df in df.dropna().groupby(df['Name'].isna().cumsum())]
print (dfs)
[ Name Value Unit
0 AA 10 mm
1 BDC 20 mm, Name Value Unit
0 AFD 60 mm
1 AKW 18 cm
2 TDF 0,5 mm, Name Value Unit
0 AA 10 mm
1 FB 65 l]