I am tryin to use python and pandas to simplify a task that we normally do in excel. So far I have combined two csv's into a single df and need some help going further. Anyways without going into to much detail on the specifics of my project using the following data frame I want to calculate the minimum elevation for a given month:
Start Minimum Z Month
0 2024-10-09 12:00:00 6056.483651 10/24
1 2029-10-08 00:00:00 5450.000000 10/29
2 2029-10-15 00:00:00 5450.000000 10/29
3 2029-10-25 00:00:00 5435.000000 10/29
4 2029-11-01 00:00:00 5450.000000 11/29
Now my goal is to take this data frame and calculate the min of the "Minimum Z" column for each month. So the output data frame would be this:
Elevation Month
0 6056.483651 10/24
1 5435.000000 10/29
2 5450.000000 11/29
So the goal is to get the lowest elevation for each month in the data frame and populate a new data frame. I have searched for a while to see if someone has done something similar but I can't seem to find anything. Any help would be appreciated!
CodePudding user response:
You can use .groupby()
and .min()
.
df[['Month','Minimum Z']].groupby('Month').min()
This first selects just the two columns in the dataframe. Then the .groupby()
groups things where we have a row/record for each unique value in the column in the groupby (so 'Month' here), and then it does the .min()
on the other column (in this case, the 'Minimum Z'). If you wanted a sum of the aggregate column, you'd do .sum()
, but that of course makes no sense with elevations.
If you want to rename the column then, just df = df.rename(columns={'Minimum Z':'Elevation'})
CodePudding user response:
hope it helps. here you get a data frame with the minimum values for the month
df.merge(df.groupby('Month')['Z'].min().reset_index(), how='inner')
Start Minimum Z Month
0 2024-10-09 12:00:00 6056.483651 10/24
1 2029-10-25 00:00:00 5435.000000 10/29
2 2029-11-01 00:00:00 5450.000000 11/29
PS: in my copying of the dataframe, time end up being under column 'minimum',