I have transformed the below dict into the Dataframe below,
xxx = {'Jack': ['EU','950','13000','16','20 km',[0]],
'John': ['EU','440','550','12','50m',[0]],
'Tim': ['US','750','2200','12','25 m',[0]],
'Tom': ['US','850','6000','16','20 km',['2022-04-10','2022-04-11','2022-04-12','2022-06-13','2022-06-14','2022-06-15','2022-07-21','2022-07-22','2022-07-23','2022-07-24','2022-07-25','2022-09-12','2022-09-13','2022-09-16','2022-09-26','2022-09-27']],
'Sarah': ['US','900','11000','33','17 km',['2022-01-01','2022-01-02','2022-03-26','2022-03-27','2022-03-28','2022-04-10','2022-04-11','2022-04-12','2022-04-13','2022-04-14','2022-06-03','2022-06-04','2022-06-05','2022-06-11','2022-06-19']]}
xxx = pd.DataFrame.from_dict(xxx, 'index').explode(5).reset_index()
xxx[5] = pd.to_datetime(xxx[5], format='%Y-%m-%d').dt.to_period('M')
xxx = xxx.groupby(['index',0,1,2,3,4,5]).size().unstack(fill_value=0)
del xxx['1970-01']
xxx.reset_index(inplace=True)
print(xxx)
5 index 0 1 2 3 4 2022-01 2022-03 2022-04 2022-06 \
0 Jack EU 950 13000 16 20 km 0 0 0 0
1 John EU 440 550 12 50m 0 0 0 0
2 Sarah US 900 11000 33 17 km 2 3 5 5
3 Tim US 750 2200 12 25 m 0 0 0 0
4 Tom US 850 6000 16 20 km 0 0 3 3
5 2022-07 2022-09
0 0 0
1 0 0
2 0 0
3 0 0
4 5 5
I would like to insert a Sum column that will sum the counts for the date columns
My code:
xxx['sum'] = sum(xxx.iloc[:,-6:])
TypeError: unsupported operand type(s) for : 'Period' and 'Period'
- Please, help to insert the Sum column that will sum the counts of the last 6 columns
- Why does my index column have header 5?
CodePudding user response:
Use sum
method of dataframe rather than sum
function:
xxx['sum'] = xxx.iloc[:,-6:].sum(axis=1)
print(xxx.iloc[:,-7:])
# Output
5 2022-01 2022-03 2022-04 2022-06 2022-07 2022-09 sum
0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0
2 2 3 5 5 0 0 15
3 0 0 0 0 0 0 0
4 0 0 3 3 5 5 16