What I have currently is a data that looks like this:
date | hour |
---|---|
21/06/2000 | 6 |
22/06/2000 | 6 |
23/06/2000 | 6 |
24/06/2000 | 6 |
What I would like to achieve:
date | hour |
---|---|
21/06/2000 | 6 |
21/06/2000 | 10 |
21/06/2000 | 14 |
21/06/2000 | 18 |
22/06/2000 | 6 |
22/06/2000 | 10 |
22/06/2000 | 14 |
22/06/2000 | 18 |
... so on and so forth | ... |
I am unable to find any solution for this on the internet where I only found ways to add new EMPTY rows which is not what I aim for. Any solution would be appreciated. Thanks in advance!
UPDATE To recreate the dataframe:
import pandas as pd
date = ['21/06/2000', '22/06/2000', '23/06/2000', '24/06/2000']
hour = [6, 6, 6, 6]
df = pd.DataFrame(list(zip(date, hour)),
columns =['date', 'hour'])
CodePudding user response:
One option is to assign a list to the "Hour" column, then explode
it:
df = df.assign(hour=[[df.at[0,'hour'] 4*i for i in range(4)]]*len(df)).explode('hour').reset_index(drop=True)
Output:
date hour
0 21/06/2000 6
1 21/06/2000 10
2 21/06/2000 14
3 21/06/2000 18
4 22/06/2000 6
5 22/06/2000 10
6 22/06/2000 14
7 22/06/2000 18
8 23/06/2000 6
9 23/06/2000 10
10 23/06/2000 14
11 23/06/2000 18
12 24/06/2000 6
13 24/06/2000 10
14 24/06/2000 14
15 24/06/2000 18
CodePudding user response:
Simple solution using repeat
and numpy
:
import numpy as np
df2 = (df.loc[df.index.repeat(4)]
.reset_index(drop=True)
)
df2['hour'] = np.arange(len(df2))%4*4
Or maybe even easier using an explicit array of the hours to add and numpy.tile
:
import numpy as np
df2 = (df.loc[df.index.repeat(4)]
.reset_index(drop=True)
)
df2['hour'] = np.tile([0,4,8,12], len(df))
Output:
date hour
0 21/06/2000 6
1 21/06/2000 10
2 21/06/2000 14
3 21/06/2000 18
4 22/06/2000 6
5 22/06/2000 10
6 22/06/2000 14
7 22/06/2000 18
8 23/06/2000 6
9 23/06/2000 10
10 23/06/2000 14
11 23/06/2000 18
12 24/06/2000 6
13 24/06/2000 10
14 24/06/2000 14
15 24/06/2000 18
CodePudding user response:
Since you are assigning [6,10,14,18]
to each row, you can use the complete function from pyjanitor to achieve this:
# the dev version has improvements
# pip install git https://github.com/pyjanitor-devs/pyjanitor.git
import janitor
import pandas as pd
df.complete('date', {'hour':[6,10,14,18]})
date hour
0 21/06/2000 6
1 21/06/2000 10
2 21/06/2000 14
3 21/06/2000 18
4 22/06/2000 6
5 22/06/2000 10
6 22/06/2000 14
7 22/06/2000 18
8 23/06/2000 6
9 23/06/2000 10
10 23/06/2000 14
11 23/06/2000 18
12 24/06/2000 6
13 24/06/2000 10
14 24/06/2000 14
15 24/06/2000 18