Pandas (python) conditional row selection-CodePudding

I searched this on stackoverflow and couldn't see a proper answer from some of the results I got. I am trying to learn pandas and goal is to find a particular value

import pandas as pd

df_data = {
    "age": [24, 9, 20, 24],
    "weight": [170, 106, 201, 216],
    "dob": ["1/1990", "1/2000", "2/1994", "1/1990"]
}

df = pd.DataFrame(df_data)

now I want to pick the rows with dob >= "1/1990" and <= "1/2020". Then I have to pick the max weight and then print the corresponding age. This is what I am doing

weightMax = df[(df["dob"] >= "1/1990") & (df["dob"] <= "1/2020")]
index = weightMax["weight"].idxmax()

if I print weightMax[index] for the whole row, it throws index outofbound error. where am I going wrong here? Also, to get the corresponding age, how do I get it?

Thanks a lot for the help

CodePudding user response：

Convert your dob column to be a proper datetime so your comparison will work reliably:

df['dob'] = pd.to_datetime(df['dob'], format='%m/%Y')

Then you can subset that using .between and take the .nlargest based on the weight and squeeze it to a Series:

heaviest = df[df['dob'].between('1991-01-01', '2020-01-01')].nlargest(1, 'weight').squeeze()

This'll give you:

age                        20
weight                    201
dob       1994-02-01 00:00:00
Name: 2, dtype: object

If you want to access the original index then you can do so by using accessing heaviest.name.

Example print:

print('The heaviest was {0.weight}kg and was {0.age} years old. They were originally at index {0.name}.'.format(heaviest))

Gives you:

'The heaviest was 201kg and was 20 years old. They were originally at index 2.'

CodePudding user response：

Please try to use loc

weightMax.loc[index]
age            9
weight       106
dob       1/2000
Name: 1, dtype: object

CodePudding user response：

You can use filter to deal with it. You gonna write more code, but it will be a more legible code.

Implements your logical in a function and use the filter on the df_data['dob'] array.

https://www.programiz.com/python-programming/methods/built-in/filter

Edit: Oh, crap. I don't use pandas frequently, but i found this article on the documentation. It's about filtering the dataframe. It's easier than the solution above. You also can use regex.

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.filter.html

CodePudding user response：

Convert all the dates to Unix times which is stored as integer then you can easily apply any filters to dates.

import pandas as pd
import datetime
import time

df_data = {
    "age": [24, 9, 20, 24],
    "weight": [170, 106, 201, 216],
    "dob": ["1/1990", "1/2000", "2/1994", "1/1990"]
}

df = pd.DataFrame(df_data) 
def dates(x):
    return time.mktime(datetime.datetime.strptime(DOB, "%m/%Y").timetuple())

df["dob_unix"]=df["dob"].map(dates)
weightMax = df[(df["dob_unix"] >= 662706000.0) & (df["dob_unix"] <= 1577854800.0)]
index = weightMax["weight"].idxmax()