Text can be used as input to pandas dataframes to make easily reproducible examples for testing solutions.1
txt= """
ID,datetime,value
1,07/10/2022 10:00:00,5
1,07/10/2022 11:15:10,7
1,09/10/2022 15:30:30,13
2,06/10/2022 11:30:22,0
2,10/10/2022 22:44:11,1
2,10/10/2022 23:30:22,6
"""
df = pd.read_csv(StringIO(txt), parse_dates=[1], dayfirst=True)
How do I index df
such that there is an integer index according to each ID, keeping the ID variable?
imagined output
ID datetime value
0 1 07/10/2022 10:00:00 5
0 1 07/10/2022 11:15:10 7
0 1 09/10/2022 15:30:30 13
1 2 06/10/2022 11:30:22 0
1 2 10/10/2022 22:44:11 1
1 2 10/10/2022 23:30:22 6
CodePudding user response:
You could try:
(
pd.read_csv(StringIO(txt),
parse_dates=[1],
dayfirst=True)
.assign(id_col= lambda x: x.ID - 1)
.set_index("id_col")
.rename_axis(index=None)
)
Output
---- ------ --------------------- ---------
| | ID | datetime | value |
|---- ------ --------------------- ---------|
| 0 | 1 | 2022-10-07 10:00:00 | 5 |
| 0 | 1 | 2022-10-07 11:15:10 | 7 |
| 0 | 1 | 2022-10-09 15:30:30 | 13 |
| 1 | 2 | 2022-10-06 11:30:22 | 0 |
| 1 | 2 | 2022-10-10 22:44:11 | 1 |
| 1 | 2 | 2022-10-10 23:30:22 | 6 |
---- ------ --------------------- ---------
References:
- indexing, please have a look at the official reference. This is a way better explanation as I could do.
index=None
to remove the index name (try it without this statement)- lambda is an anonymous function. You could also do something like
df["id_col"] = df["ID"] - 1
.lambda
makes it easier and faster. - the brackets are so that you can do line breaks without
\
for linebreak (so it's just for styling)