Home > OS >  How to add multiple rows after each row in pandas that depends on the value of the row?
How to add multiple rows after each row in pandas that depends on the value of the row?

Time:04-16

What I have currently is a data that looks like this:

date hour
21/06/2000 6
22/06/2000 6
23/06/2000 6
24/06/2000 6

What I would like to achieve:

date hour
21/06/2000 6
21/06/2000 10
21/06/2000 14
21/06/2000 18
22/06/2000 6
22/06/2000 10
22/06/2000 14
22/06/2000 18
... so on and so forth ...

I am unable to find any solution for this on the internet where I only found ways to add new EMPTY rows which is not what I aim for. Any solution would be appreciated. Thanks in advance!

UPDATE To recreate the dataframe:

import pandas as pd

date = ['21/06/2000', '22/06/2000', '23/06/2000', '24/06/2000']
hour = [6, 6, 6, 6]

df = pd.DataFrame(list(zip(date, hour)),
               columns =['date', 'hour'])

CodePudding user response:

One option is to assign a list to the "Hour" column, then explode it:

df = df.assign(hour=[[df.at[0,'hour']   4*i for i in range(4)]]*len(df)).explode('hour').reset_index(drop=True)

Output:

          date hour
0   21/06/2000    6
1   21/06/2000   10
2   21/06/2000   14
3   21/06/2000   18
4   22/06/2000    6
5   22/06/2000   10
6   22/06/2000   14
7   22/06/2000   18
8   23/06/2000    6
9   23/06/2000   10
10  23/06/2000   14
11  23/06/2000   18
12  24/06/2000    6
13  24/06/2000   10
14  24/06/2000   14
15  24/06/2000   18

CodePudding user response:

Simple solution using repeat and numpy:

import numpy as np
df2 = (df.loc[df.index.repeat(4)]
         .reset_index(drop=True)
       )
df2['hour']  = np.arange(len(df2))%4*4

Or maybe even easier using an explicit array of the hours to add and numpy.tile:

import numpy as np
df2 = (df.loc[df.index.repeat(4)]
         .reset_index(drop=True)
       )
df2['hour']  = np.tile([0,4,8,12], len(df))

Output:

          date  hour
0   21/06/2000     6
1   21/06/2000    10
2   21/06/2000    14
3   21/06/2000    18
4   22/06/2000     6
5   22/06/2000    10
6   22/06/2000    14
7   22/06/2000    18
8   23/06/2000     6
9   23/06/2000    10
10  23/06/2000    14
11  23/06/2000    18
12  24/06/2000     6
13  24/06/2000    10
14  24/06/2000    14
15  24/06/2000    18

CodePudding user response:

Since you are assigning [6,10,14,18] to each row, you can use the complete function from pyjanitor to achieve this:

# the dev version has improvements
# pip install git https://github.com/pyjanitor-devs/pyjanitor.git
import janitor
import pandas as pd

df.complete('date', {'hour':[6,10,14,18]})
          date  hour
0   21/06/2000     6
1   21/06/2000    10
2   21/06/2000    14
3   21/06/2000    18
4   22/06/2000     6
5   22/06/2000    10
6   22/06/2000    14
7   22/06/2000    18
8   23/06/2000     6
9   23/06/2000    10
10  23/06/2000    14
11  23/06/2000    18
12  24/06/2000     6
13  24/06/2000    10
14  24/06/2000    14
15  24/06/2000    18
  • Related