Home > Software engineering >  Loop through certain columns of a dataframe
Loop through certain columns of a dataframe

Time:09-15

I am trying to define a function with a for for loop that will iterate the column weight and return a list of patient names that have a weight <= 150. I'm honestly just confused about how I should go about this. Any help will be much appreciated.

df:   Patient          Weight   LDL 
0      Rob              200      100
1      Bob              150      150
2      John             184      102
3      Phil             120      200
4      Jessica          100      143
# List of Tuples
Patients = [('Rob', 200, 100),
           ('Bob', 150, 150),
           ('John', 184, 102),
           ('Phil', 120, 200),
            ('Jessica', 100, 143 )
            ]
# Create a DataFrame object
df = pd.DataFrame(Patients, columns =['Patient', 'Weight', 'LDL'],
                      index =['0','1', '2', '3', '4'])
 
df

def greater_150(df, outcome = 'Weight'):
    new_list = []
    patient = df['Patient']
    for column in df[['Patient', 'Weight']]:
        if outcome <= 150:
           new_list.append(patient)
    return new_list

Ideally the Output I would want:

[ Rob, Bob, John]

TypeError:

'<=' not supported between instances of 'str' and 'int'

CodePudding user response:

Here's a simple approach that avoids iteration (as is typically ideal when pandas is involved).

df[df["Weight"] >= 150].Patient

returns the following pandas series:

0     Rob
1     Bob
2    John
Name: Patient, dtype: object

If you want, you can make this into a list with df[df["Weight"] >= 150].Patient.tolist(), which yields ['Rob', 'Bob', 'John'].

CodePudding user response:

Generally avoid iterations, as the answer by Ben points out. But if you want to learn how to do it, here's your function modified to iterate through the rows (not the columns!):

def greater_150(df, outcome = 'Weight'):
    new_list = []
    for index, data in df.iterrows():
        if data[outcome] >= 150:
           new_list.append(data["Patient"])
    return new_list

CodePudding user response:

Try the following:

def greater_150(df):
    return df.loc[df["Weight"] >= 150].Patient.tolist()
  • Related