Home > OS >  Select all rows in a Pandas DataFrame with loc by passing in a string or other object
Select all rows in a Pandas DataFrame with loc by passing in a string or other object

Time:09-30

I am writing a function that selects a subset of rows from a pandas DataFrame.

The function looks like this,

def get_predictions(df: pd.DataFrame, subset: str) -> pd.DataFrame:
    return df['properties', 'prediction'].loc[subset]

I would like this function to be able to handle the case where I want to select all of the rows in the DataFrame. One solution to this is to make the subset argument default to None and return the entire DataFrame if the subset argument is set to None.

def get_predictions(df: pd.DataFrame, subset: str) -> pd.DataFrame:
    if subset is None:
        return df['properties', 'prediction']
    else:
        return df['properties', 'prediction'].loc[subset]

I don't like this solution because I am duplicating a lot of code. Is there a better solution that does not involve duplication. Specifically, is there an object that I could pass into .loc[] which would return all of the rows in the DataFrame?

This is the ideal solution that I am looking for,

def get_predictions(df: pd.DataFrame, subset=MysteryObject) -> pd.DataFrame:
    return df['properties', 'prediction'].loc[MysteryObject]

Is there a MysteryObject that could achieve this desired behavior?

CodePudding user response:

just pass in

subset = df.index

Also, it is better practice to subset both the rows and columns using .loc. That way, you get a view into the subset, rather than generating a copy of the columns first. so just do

df.loc[subset, ['properties', 'prediction']]

CodePudding user response:

Let's try setting the default to slice(None) instead of just None:

def get_predictions(
        df: pd.DataFrame, subset: str = slice(None)
) -> pd.DataFrame:
    return df[['properties', 'prediction']].loc[subset]

Although it would be even better practice to subset both axes in one step:

def get_predictions(
        df: pd.DataFrame, subset: str = slice(None)
) -> pd.DataFrame:
    return df.loc[subset, ['properties', 'prediction']]

slice(None) is the equivalent to : with the exception that it can be assigned to a variable.

df.loc[:, 'col'] == df.loc[slice(None), 'col']

Test Code:

test_df = pd.DataFrame({'properties': [1, 2, 3],
                        'prediction': [4, 5, 6],
                        'other': [7, 8, 9]},
                       index=['a', 'b', 'c'])

print('Subset \'a\'')
print(get_predictions(test_df, 'a'))
print('No Subset')
print(get_predictions(test_df))

Output:

Subset 'a'
properties    1
prediction    4
Name: a, dtype: int64

No Subset
   properties  prediction
a           1           4
b           2           5
c           3           6
  • Related