Home > OS >  Pandas Dataframe, add new column with range of data for each row in Python using Pandas
Pandas Dataframe, add new column with range of data for each row in Python using Pandas

Time:01-10

I have a dataframe with a single column and want to create a new column called 'Hour' for hours 0-23 but for each row.

Current:

AN_LOG_ID
00000001
00000002
00000003

Desired output: (0-23 for each hour of the day for each row)

AN_LOG_ID    HOUR
00000001      0
00000001      1
...          ...
00000001      23
00000002      0
00000002      1
...          ...
00000002      23
00000003      0
00000003      1
...          ...
00000003      23

CodePudding user response:

>>> df = df.assign(HOUR=[range(24)] * len(df)).explode("HOUR", ignore_index=True)
>>> df

   AN_LOG_ID HOUR
0   00000001    0
1   00000001    1
2   00000001    2
3   00000001    3
4   00000001    4
..       ...  ...
67  00000003   19
68  00000003   20
69  00000003   21
70  00000003   22
71  00000003   23

[72 rows x 2 columns]

Assign range(24) to each row as "HOUR" first, and then explode that "HOUR" column to spread hours on its own rows. (ignore_index=True makes the resultant index 0, 1, 2, ...)

CodePudding user response:

We can use Index.repeat then use groupby.cumcount to get your HOUR column:

df = df.loc[df.index.repeat(24)]
df = df.assign(HOUR=df.groupby(level=0).cumcount()).reset_index(drop=True)
    N_LOG_ID  HOUR
0   00000001     0
1   00000001     1
2   00000001     2
3   00000001     3
4   00000001     4
..       ...   ...
67  00000003    19
68  00000003    20
69  00000003    21
70  00000003    22
71  00000003    23

CodePudding user response:

Use a cross merge:

out = df.merge(pd.DataFrame({'HOUR': range(24)}), how='cross')

Output:

   AN_LOG_ID  HOUR
0   00000001     0
1   00000001     1
2   00000001     2
3   00000001     3
4   00000001     4
..       ...   ...
67  00000003    19
68  00000003    20
69  00000003    21
70  00000003    22
71  00000003    23

[72 rows x 2 columns]

CodePudding user response:

Another possible solution, based on numpy:

pd.DataFrame(
    np.concatenate(
        (np.repeat(df.values, 24).reshape(-1,1),
         np.tile(np.arange(24), len(df)).reshape(-1,1)), axis=1), 
    columns=['AN_LOG_ID', 'HOUR'])

Output:

   AN_LOG_ID HOUR
0   00000001    0
1   00000001    1
2   00000001    2
3   00000001    3
4   00000001    4
..       ...  ...
67  00000003   19
68  00000003   20
69  00000003   21
70  00000003   22
71  00000003   23
  • Related