I have a usage data per customer, collected per months during several years, shaped as ~(6000, 60).
Sample dataframe:
df = pd.DataFrame({'id': ['user_1', 'user_2'], 'access_type': ['mobile', 'desktop'], '2018-09-01 00:00:00': [7,5], '2018-10-01 00:00:00':[1,3], '2018-11-01 00:00:00':[0,10]})
id access_type 2018-09-01 00:00:00 2018-10-01 00:00:00 2018-11-01 00:00:00
0 user_1 mobile 7 1 0
1 user_2 desktop 5 3 10
How do I change 40 selected date-columns to a datetime index (?) format, or other format that will allow selecting/slicing required periods of time as date?
CodePudding user response:
Use DataFrame.melt
with DataFrame.set_index
:
df2 = (df.melt(['id','access_type'], var_name='date')
.assign(date = lambda x: pd.to_datetime(x['date']))
.set_index('date'))
print (df2)
id access_type value
date
2018-09-01 user_1 mobile 7
2018-09-01 user_2 desktop 5
2018-10-01 user_1 mobile 1
2018-10-01 user_2 desktop 3
2018-11-01 user_1 mobile 0
2018-11-01 user_2 desktop 10
If need MultiIndex
use set_index
with DataFrame.stack
:
s = (df.set_index(['id','access_type'])
.stack()
.rename(index = lambda x: pd.to_datetime(x), level=2))
print (s)
Or:
s = (df.melt(['id','access_type'], var_name='date')
.assign(date = lambda x: pd.to_datetime(x['date']))
.set_index(['id','access_type','date'])['value'])
print (s)