Home > Mobile >  Removing values from DataFrame columns based on list of values
Removing values from DataFrame columns based on list of values

Time:11-17

I have a slightly specific problem.

A pandas DataFrame with let's say 6 columns. Each column has a unique set of values, which need to be looked for and removed / updated with a new value.

The lookup list would look like this:

lookup_values_per_column = [[-99999], [9999], [99, 98],[9],[99],[996, 997, 998, 999]]

Now, what I want to do is: Look at column 1 of the dataframe and check if -99999 is present, if yes, remove / update each instance with a fixed value (lets say NA's) Then we move to the next column and then check for all 9999 and also update them with NA's. If we don't find a match, we just leave the column as it is.

I couldn't find a solution and I guess it's not so hard anyways.

CodePudding user response:

We can use DataFrame.replace with a dictionary built from the list and columns:

df = df.replace(
    to_replace=dict(zip(df.columns, lookup_values_per_column)),
    value=np.NAN
)

Sample output:

     A    B    C    D    E    F
0  4.0  1.0  NaN  2.0  3.0  NaN
1  NaN  3.0  2.0  NaN  4.0  NaN
2  1.0  4.0  NaN  1.0  1.0  NaN
3  2.0  2.0  3.0  3.0  2.0  1.0
4  3.0  NaN  1.0  4.0  NaN  NaN

Setup Used:

from random import sample, seed
from string import ascii_uppercase

import numpy as np
import pandas as pd

lookup_values_per_column = [
    [-99999], [9999], [99, 98], [9], [99], [996, 997, 998, 999]
]
df_len = max(map(len, lookup_values_per_column))   1
seed(10)
df = pd.DataFrame({
    k: sample(v   list(range(1, df_len   1 - len(v))), df_len)
    for k, v in
    zip(ascii_uppercase, lookup_values_per_column)
})

df:

       A     B   C  D   E    F
0      4     1  99  2   3  997
1 -99999     3   2  9   4  998
2      1     4  98  1   1  999
3      2     2   3  3   2    1
4      3  9999   1  4  99  996
  • Related