I have this dataframe:
I want to sum all the distances that have been done each day, giving the next dataframe:
distances = [
(32.2,1),
(40.2,1),
(22.5,2),
(37.6,2),
(5.6,2),
(5.8,3),
(9.7,3),
(10.2,3),
(12.3,4),
(15.2,4),
]
expected_result = [
(72.5,1),
(65.5,2),
(25.7,3),
(27.5,4),
]
distances = pd.DataFrame(distances, columns = ['distance','day'])
expected_result = pd.DataFrame(expected_result, columns = ['distance','day'])
I'm new to pandas so I don't know exactly how to do it.
CodePudding user response:
You can group the data by "day" then sum it
distances = distances.groupby('day').sum()
If you want to sort the data according to distance you can use this
distances = distances.sort_values(by=['distance'], ascending=False)
CodePudding user response:
use groupby
data="""distance day
32.2 1
40.2 1
22.5 2
37.6 2
5.6 2
5.8 3
9.7 3
10.2 3
12.3 4
15.2 4
"""
print("Sum by day")
import pandas as pd
df = pd.read_csv(StringIO(data), sep='\t')
df.groupby('day').sum()
output:
distance
day
1 72.4
2 65.7
3 25.7
4 27.5