I have a Dataframe with approx. 70000 rows and 6 columns. The inputs are numbers expect of some hyphens. I want to split the dataframe (by rows) every time a hyphen appears. The row with the hyphen can be deleted.
Example of Dataframe:
Timestamp ; power(kW) ; ....
2020-01-01 17:50:10 ; 4.32 ; ...
2020-01-01 17:55:15 ; 4.30 ; ...
2020-01-01 18:00:20 ; 3.20 ; ...
2020-01-01 18:05:25 ; - ; ...
2020-01-03 12:00:20 ; 6.20 ; ...
Expectet Outcome: 2 Dataframe
CodePudding user response:
You can use:
# identify rows with "-"
m = df['power(kW)'].eq('-')
# or based on non-numbers:
# m = pd.to_numeric(df['power(kW)'], errors='coerce').isna()
# split per group while removing the invalid rows
subdfs = [d for _,d in df[~m].groupby(m.cumsum())]
output list:
[ Timestamp power(kW) ....
0 2020-01-01 17:50:10 4.32 ...
1 2020-01-01 17:55:15 4.30 ...
2 2020-01-01 18:00:20 3.20 ...,
Timestamp power(kW) ....
4 2020-01-03 12:00:20 6.20 ...]
Accessing subdataframes:
subdf[0]
Timestamp power(kW) ....
0 2020-01-01 17:50:10 4.32 ...
1 2020-01-01 17:55:15 4.30 ...
2 2020-01-01 18:00:20 3.20 ...
NB. because your initial data had strings, the dtype of the output will be string/object. You must convert the types if you plan to perform vectorial operations.
One option:
subdfs = [d for _,d in df[~m].astype({'Timestamp': 'datetime64',
'power(kW)': float})
.groupby(m.cumsum())]