Home > Mobile >  Select cells in a pandas DataFrame by a Series of its column labels
Select cells in a pandas DataFrame by a Series of its column labels

Time:03-31

Say we have a DataFrame and a Series of its column labels, both (almost) sharing a common index:

df = pd.DataFrame(...)
s = df.idxmax(axis=1).shift(1)

How can I obtain cells given a series of columns, getting value from every row using a corresponding column label from the joined series? I'd imagine it would be:

values = df[s]  # either
values = df.loc[s]  # or

In my example I'd like to have values that are under biggest-in-their-row values (I'm doing a poor man's ML :) )

However I cannot find any interface selecting cells by series of columns. Any ideas folks?

Meanwhile I use this monstrous snippet:

def get_by_idxs(df: pd.DataFrame, idxs: pd.Series) -> pd.Series:
    ts_v_pairs = [
        (ts, row[row['idx']])
        for ts, row in df.join(idxs.rename('idx'), how='inner').iterrows()
        if isinstance(row['idx'], str)
    ]
    
    return pd.Series([v for ts, v in ts_v_pairs], index=[ts for ts, v in ts_v_pairs])

CodePudding user response:

I think you need dataframe lookup

v = s.dropna()
v[:] = df.to_numpy()[range(len(v)), df.columns.get_indexer_for(v)]
  • Related