How to add multiple rows after each row in pandas that depends on the value of the row?-CodePudding

What I have currently is a data that looks like this:

date	hour
21/06/2000	6
22/06/2000	6
23/06/2000	6
24/06/2000	6

What I would like to achieve:

date	hour
21/06/2000	6
21/06/2000	10
21/06/2000	14
21/06/2000	18
22/06/2000	6
22/06/2000	10
22/06/2000	14
22/06/2000	18
... so on and so forth	...

I am unable to find any solution for this on the internet where I only found ways to add new EMPTY rows which is not what I aim for. Any solution would be appreciated. Thanks in advance!

UPDATE To recreate the dataframe:

import pandas as pd

date = ['21/06/2000', '22/06/2000', '23/06/2000', '24/06/2000']
hour = [6, 6, 6, 6]

df = pd.DataFrame(list(zip(date, hour)),
               columns =['date', 'hour'])

CodePudding user response：

One option is to assign a list to the "Hour" column, then explode it:

df = df.assign(hour=[[df.at[0,'hour']   4*i for i in range(4)]]*len(df)).explode('hour').reset_index(drop=True)

Output:

          date hour
0   21/06/2000    6
1   21/06/2000   10
2   21/06/2000   14
3   21/06/2000   18
4   22/06/2000    6
5   22/06/2000   10
6   22/06/2000   14
7   22/06/2000   18
8   23/06/2000    6
9   23/06/2000   10
10  23/06/2000   14
11  23/06/2000   18
12  24/06/2000    6
13  24/06/2000   10
14  24/06/2000   14
15  24/06/2000   18

CodePudding user response：

Simple solution using repeat and numpy:

import numpy as np
df2 = (df.loc[df.index.repeat(4)]
         .reset_index(drop=True)
       )
df2['hour']  = np.arange(len(df2))%4*4

Or maybe even easier using an explicit array of the hours to add and numpy.tile:

import numpy as np
df2 = (df.loc[df.index.repeat(4)]
         .reset_index(drop=True)
       )
df2['hour']  = np.tile([0,4,8,12], len(df))

Output:

          date  hour
0   21/06/2000     6
1   21/06/2000    10
2   21/06/2000    14
3   21/06/2000    18
4   22/06/2000     6
5   22/06/2000    10
6   22/06/2000    14
7   22/06/2000    18
8   23/06/2000     6
9   23/06/2000    10
10  23/06/2000    14
11  23/06/2000    18
12  24/06/2000     6
13  24/06/2000    10
14  24/06/2000    14
15  24/06/2000    18

CodePudding user response：

Since you are assigning [6,10,14,18] to each row, you can use the complete function from pyjanitor to achieve this:

# the dev version has improvements
# pip install git https://github.com/pyjanitor-devs/pyjanitor.git
import janitor
import pandas as pd

df.complete('date', {'hour':[6,10,14,18]})
          date  hour
0   21/06/2000     6
1   21/06/2000    10
2   21/06/2000    14
3   21/06/2000    18
4   22/06/2000     6
5   22/06/2000    10
6   22/06/2000    14
7   22/06/2000    18
8   23/06/2000     6
9   23/06/2000    10
10  23/06/2000    14
11  23/06/2000    18
12  24/06/2000     6
13  24/06/2000    10
14  24/06/2000    14
15  24/06/2000    18