Home > Net >  Add an empty row in a dataframe when the entries of a column repeats
Add an empty row in a dataframe when the entries of a column repeats

Time:05-09

I have a dataframe that stores time-series data

Please find the code below

import pandas as pd
from pprint import pprint

d = {
    't': [0, 1, 2, 0, 2, 0, 1],
    'input': [2, 2, 2, 2, 2, 2, 4],
    'type': ['A', 'A', 'A', 'B', 'B', 'B', 'A'],
    'value': [0.1, 0.2, 0.3, 1, 2, 3, 1],
}
df = pd.DataFrame(d)
pprint(df)

df>
t  input type  value
0      2    A    0.1
1      2    A    0.2
2      2    A    0.3
0      2    B    1.0
2      2    B    2.0
0      2    B    3.0
1      4    A    1.0

When the first entry of the column t repeats, I would like to add an empty row.

Expected output:

df>
t  input type  value
0      2    A    0.1
1      2    A    0.2
2      2    A    0.3

0      2    B    1.0
2      2    B    2.0

0      2    B    3.0
1      4    A    1.0

I am not sure how to do this. Suggestions will be really helpful.

EDIT: dup = df['t'].eq(0).shift(-1, fill_value=False)

helps when starting value in row t si 0.

But it could also be a non-zero value like the example below. Additional example:

d = {
    't': [25, 35, 90, 25, 90, 25, 35],
    'input': [2, 2, 2, 2, 2, 2, 4],
    'type': ['A', 'A', 'A', 'B', 'B', 'B', 'A'],
    'value': [0.1, 0.2, 0.3, 1, 2, 3, 1],
}

CodePudding user response:

There are several ways to achieve this

option 1

you can use groupby.apply:

(df.groupby(df['t'].eq(0).cumsum(), as_index=False, group_keys=False)
   .apply(lambda d: pd.concat([d, pd.Series(index=d.columns, name='').to_frame().T]))
)

output:

     t  input type  value
0  0.0    2.0    A    0.1
1  1.0    2.0    A    0.2
2  2.0    2.0    A    0.3
   NaN    NaN  NaN    NaN
3  0.0    2.0    B    1.0
4  2.0    2.0    B    2.0
   NaN    NaN  NaN    NaN
5  0.0    2.0    B    3.0
6  1.0    4.0    A    1.0
   NaN    NaN  NaN    NaN

option 2

An alternative if the index is already sorted:

dup = df['t'].eq(0).shift(-1, fill_value=False)

pd.concat([df, df.loc[dup].assign(**{c: '' for c in df})]).sort_index()

output:

   t input type value
0  0     2    A   0.1
1  1     2    A   0.2
2  2     2    A   0.3
2                    
3  0     2    B   1.0
4  2     2    B   2.0
4                    
5  0     2    B   3.0
6  1     4    A   1.0

addendum on grouping

set the group when the value decreases:

dup = df['t'].diff().lt(0).cumsum()

(df.groupby(dup, as_index=False, group_keys=False)
   .apply(lambda d: pd.concat([d, pd.Series(index=d.columns, name='').to_frame().T]))
)

CodePudding user response:

Because groupby is generally slow, you can create helper DataFrame by consecutive groups for starting by 0 in t column, join by concat and sorting:

#groups starting by 0
df.index = df['t'].eq(0).cumsum()

#groups starting by difference if less like 0
df.index = (~df['t'].diff().gt(0)).cumsum()

df = (pd.concat([df, pd.DataFrame('', columns=df.columns, index=df.index.unique())])
        .sort_index(kind='mergesort', ignore_index=True)
        .iloc[:-1])
print (df)
   t input type value
0  0     2    A   0.1
1  1     2    A   0.2
2  2     2    A   0.3
3                    
4  0     2    B   1.0
5  2     2    B   2.0
6                    
7  0     2    B   3.0
8  1     4    A   1.0

df.index = (~df['t'].diff().gt(0)).cumsum()

df = (pd.concat([df, pd.DataFrame(' ', columns=df.columns, index=df.index.unique())])
        .sort_index(kind='mergesort', ignore_index=True)
        .iloc[:-1])
print (df)
    t input type value
0  25     2    A   0.1
1  35     2    A   0.2
2  90     2    A   0.3
3                     
4  25     2    B   1.0
5  90     2    B   2.0
6                     
7  25     2    B   3.0
8  35     4    A   1.0

CodePudding user response:

Here is my suggestion:

pd.concat([pd.DataFrame(index=df.index[df.t == df.t.iat[0]][1:]), df]).sort_index()

      t  input type  value
0  25.0    2.0    A    0.1
1  35.0    2.0    A    0.2
2  90.0    2.0    A    0.3
3   NaN    NaN  NaN    NaN
3  25.0    2.0    B    1.0
4  90.0    2.0    B    2.0
5   NaN    NaN  NaN    NaN
5  25.0    2.0    B    3.0
6  35.0    4.0    A    1.0
  • Related