Home > Mobile >  How to select dataframe rows based on index value and columns value criteria?
How to select dataframe rows based on index value and columns value criteria?

Time:04-04

I have a dataframe that is a stacked version of n=9 dataframes:

df
        f     a   config
0   0.491 0.368  old.000
1   0.369 0.333  old.000
2   0.372 0.276  old.000
3   0.346 0.300  old.000
4   0.213 0.161  old.000
..    ...   ...      ...
212 1.000 1.000  fin.111
213 1.000 1.000  fin.111
214 1.000 1.000  fin.111
215 1.000 1.000  fin.111
216 1.000 1.000  fin.111

[1953 rows x 3 columns]

Each "stacked" sub-dataframe corresponds to a different value of config, ie:

df['config'].unique()
array(['old.000', 'fin.000', 'fin.001', 'fin.010', 'fin.011', 'fin.100',
       'fin.101', 'fin.110', 'fin.111'], dtype=object)

I want to "filter" this dataframe by a criteria (composed of index and config value) given by a pd.Series:

ser_criteria
0      old.001
1      fin.101
2      fin.100
3      fin.101
4      fin.101
        ...   
212    fin.000
213    old.000
214    old.000
215    old.000
216    old.000
Length: 217, dtype: object

So, I would need my output to be given by:

df_filtered
        f     a   config
0   0.481 0.368  old.001
1   0.569 0.333  fin.101
2   0.672 0.276  fin.100
3   0.378 0.111  fin.101
4   0.987 0.213  fin.101
..    ...   ...   ...   
212 0.500 0.111  fin.000
213 1.000 1.000  old.000
214 0.765 0.123  old.000
215 0.000 1.000  old.000
216 0.333 0.123  old.000

[217 rows x 3 columns]

What is the more efficient way to do this? The only way I could find was to do this element by element (from index)...

CodePudding user response:

Convert index to index columns in both DataFrames and use right join:

df = (ser_criteria.rename('config')
                  .reset_index()
                  .merge(df.reset_index(), on=['index','config'],how='right')
                  .drop('index', axis=1))

CodePudding user response:

If the index values in your filter criteria are not important, you can do the following:

import pandas as pd

# Assign your dataframe to df (example)
df = pd.read_csv("Documents/dataframe.tsv", sep="\t")
df

f   a   config
0   0.491   0.368   old.000
1   0.369   0.333   old.000
2   0.372   0.276   old.000
3   0.346   0.300   old.000
4   0.213   0.161   old.000
212 1.000   1.000   fin.111
213 1.000   1.000   fin.111
214 1.000   1.000   fin.111
215 1.000   1.000   fin.111
216 1.000   1.000   fin.111


# Assign your filter series to filter (example)
filter = pd.Series(('old.001', 'fin.101', 'fin.100', 'fin.101', 'fin.101', 'fin.000', 'old.000', 'old.000', 'old.000', 'old.000'))
filter

0    old.001
1    fin.101
2    fin.100
3    fin.101
4    fin.101
5    fin.000
6    old.000
7    old.000
8    old.000
9    old.000
dtype: object

# Then subset the dataframe by which rows have a config value in your filter
df[df['config'].isin(filter)]
  • Related