I have a slightly specific problem.
A pandas DataFrame with let's say 6 columns. Each column has a unique set of values, which need to be looked for and removed / updated with a new value.
The lookup list would look like this:
lookup_values_per_column = [[-99999], [9999], [99, 98],[9],[99],[996, 997, 998, 999]]
Now, what I want to do is: Look at column 1 of the dataframe and check if -99999 is present, if yes, remove / update each instance with a fixed value (lets say NA's) Then we move to the next column and then check for all 9999 and also update them with NA's. If we don't find a match, we just leave the column as it is.
I couldn't find a solution and I guess it's not so hard anyways.
CodePudding user response:
We can use DataFrame.replace
with a dictionary built from the list
and columns
:
df = df.replace(
to_replace=dict(zip(df.columns, lookup_values_per_column)),
value=np.NAN
)
Sample output:
A B C D E F
0 4.0 1.0 NaN 2.0 3.0 NaN
1 NaN 3.0 2.0 NaN 4.0 NaN
2 1.0 4.0 NaN 1.0 1.0 NaN
3 2.0 2.0 3.0 3.0 2.0 1.0
4 3.0 NaN 1.0 4.0 NaN NaN
Setup Used:
from random import sample, seed
from string import ascii_uppercase
import numpy as np
import pandas as pd
lookup_values_per_column = [
[-99999], [9999], [99, 98], [9], [99], [996, 997, 998, 999]
]
df_len = max(map(len, lookup_values_per_column)) 1
seed(10)
df = pd.DataFrame({
k: sample(v list(range(1, df_len 1 - len(v))), df_len)
for k, v in
zip(ascii_uppercase, lookup_values_per_column)
})
df
:
A B C D E F
0 4 1 99 2 3 997
1 -99999 3 2 9 4 998
2 1 4 98 1 1 999
3 2 2 3 3 2 1
4 3 9999 1 4 99 996