I have a dataframe that stores time-series data
Please find the code below
import pandas as pd
from pprint import pprint
d = {
't': [0, 1, 2, 0, 2, 0, 1],
'input': [2, 2, 2, 2, 2, 2, 4],
'type': ['A', 'A', 'A', 'B', 'B', 'B', 'A'],
'value': [0.1, 0.2, 0.3, 1, 2, 3, 1],
}
df = pd.DataFrame(d)
pprint(df)
df>
t input type value
0 2 A 0.1
1 2 A 0.2
2 2 A 0.3
0 2 B 1.0
2 2 B 2.0
0 2 B 3.0
1 4 A 1.0
When the first entry of the column t
repeats, I would like to add an empty row.
Expected output:
df>
t input type value
0 2 A 0.1
1 2 A 0.2
2 2 A 0.3
0 2 B 1.0
2 2 B 2.0
0 2 B 3.0
1 4 A 1.0
I am not sure how to do this. Suggestions will be really helpful.
EDIT:
dup = df['t'].eq(0).shift(-1, fill_value=False)
helps when starting value in row t
si 0.
But it could also be a non-zero value like the example below. Additional example:
d = {
't': [25, 35, 90, 25, 90, 25, 35],
'input': [2, 2, 2, 2, 2, 2, 4],
'type': ['A', 'A', 'A', 'B', 'B', 'B', 'A'],
'value': [0.1, 0.2, 0.3, 1, 2, 3, 1],
}
CodePudding user response:
There are several ways to achieve this
option 1
you can use groupby.apply
:
(df.groupby(df['t'].eq(0).cumsum(), as_index=False, group_keys=False)
.apply(lambda d: pd.concat([d, pd.Series(index=d.columns, name='').to_frame().T]))
)
output:
t input type value
0 0.0 2.0 A 0.1
1 1.0 2.0 A 0.2
2 2.0 2.0 A 0.3
NaN NaN NaN NaN
3 0.0 2.0 B 1.0
4 2.0 2.0 B 2.0
NaN NaN NaN NaN
5 0.0 2.0 B 3.0
6 1.0 4.0 A 1.0
NaN NaN NaN NaN
option 2
An alternative if the index is already sorted:
dup = df['t'].eq(0).shift(-1, fill_value=False)
pd.concat([df, df.loc[dup].assign(**{c: '' for c in df})]).sort_index()
output:
t input type value
0 0 2 A 0.1
1 1 2 A 0.2
2 2 2 A 0.3
2
3 0 2 B 1.0
4 2 2 B 2.0
4
5 0 2 B 3.0
6 1 4 A 1.0
addendum on grouping
set the group when the value decreases:
dup = df['t'].diff().lt(0).cumsum()
(df.groupby(dup, as_index=False, group_keys=False)
.apply(lambda d: pd.concat([d, pd.Series(index=d.columns, name='').to_frame().T]))
)
CodePudding user response:
Because groupby
is generally slow, you can create helper DataFrame
by consecutive groups for starting by 0
in t
column, join by concat
and sorting:
#groups starting by 0
df.index = df['t'].eq(0).cumsum()
#groups starting by difference if less like 0
df.index = (~df['t'].diff().gt(0)).cumsum()
df = (pd.concat([df, pd.DataFrame('', columns=df.columns, index=df.index.unique())])
.sort_index(kind='mergesort', ignore_index=True)
.iloc[:-1])
print (df)
t input type value
0 0 2 A 0.1
1 1 2 A 0.2
2 2 2 A 0.3
3
4 0 2 B 1.0
5 2 2 B 2.0
6
7 0 2 B 3.0
8 1 4 A 1.0
df.index = (~df['t'].diff().gt(0)).cumsum()
df = (pd.concat([df, pd.DataFrame(' ', columns=df.columns, index=df.index.unique())])
.sort_index(kind='mergesort', ignore_index=True)
.iloc[:-1])
print (df)
t input type value
0 25 2 A 0.1
1 35 2 A 0.2
2 90 2 A 0.3
3
4 25 2 B 1.0
5 90 2 B 2.0
6
7 25 2 B 3.0
8 35 4 A 1.0
CodePudding user response:
Here is my suggestion:
pd.concat([pd.DataFrame(index=df.index[df.t == df.t.iat[0]][1:]), df]).sort_index()
t input type value
0 25.0 2.0 A 0.1
1 35.0 2.0 A 0.2
2 90.0 2.0 A 0.3
3 NaN NaN NaN NaN
3 25.0 2.0 B 1.0
4 90.0 2.0 B 2.0
5 NaN NaN NaN NaN
5 25.0 2.0 B 3.0
6 35.0 4.0 A 1.0