get index per ID - from example data using text input to pandas dataframe? easily reproducible-CodePudding

Text can be used as input to pandas dataframes to make easily reproducible examples for testing solutions.1

txt= """
ID,datetime,value
1,07/10/2022 10:00:00,5 
1,07/10/2022 11:15:10,7 
1,09/10/2022 15:30:30,13 
2,06/10/2022 11:30:22,0 
2,10/10/2022 22:44:11,1 
2,10/10/2022 23:30:22,6 
"""

df = pd.read_csv(StringIO(txt), parse_dates=[1], dayfirst=True)

How do I index df such that there is an integer index according to each ID, keeping the ID variable?

imagined output

     ID            datetime value
0     1 07/10/2022 10:00:00     5 
0     1 07/10/2022 11:15:10     7 
0     1 09/10/2022 15:30:30    13 
1     2 06/10/2022 11:30:22     0 
1     2 10/10/2022 22:44:11     1 
1     2 10/10/2022 23:30:22     6

CodePudding user response：

You could try:

(
    pd.read_csv(StringIO(txt), 
                 parse_dates=[1], 
                 dayfirst=True)
    .assign(id_col= lambda x: x.ID - 1)
    .set_index("id_col")
    .rename_axis(index=None)
)

Output

 ---- ------ --------------------- --------- 
|    |   ID | datetime            |   value |
|---- ------ --------------------- ---------|
|  0 |    1 | 2022-10-07 10:00:00 |       5 |
|  0 |    1 | 2022-10-07 11:15:10 |       7 |
|  0 |    1 | 2022-10-09 15:30:30 |      13 |
|  1 |    2 | 2022-10-06 11:30:22 |       0 |
|  1 |    2 | 2022-10-10 22:44:11 |       1 |
|  1 |    2 | 2022-10-10 23:30:22 |       6 |
 ---- ------ --------------------- ---------

References:

indexing, please have a look at the official reference. This is a way better explanation as I could do.
index=None to remove the index name (try it without this statement)
lambda is an anonymous function. You could also do something like df["id_col"] = df["ID"] - 1. lambda makes it easier and faster.
the brackets are so that you can do line breaks without \ for linebreak (so it's just for styling)