Home > Blockchain >  Replacing the first occurrence of a value in each row of a pandas DataFrame
Replacing the first occurrence of a value in each row of a pandas DataFrame

Time:03-24

I have a DataFrame like this:

col1 col2 col3 col4
5 7 12 9
0 9 9 1
9 9 1 1
10 5 2 9
9 3 0 18

Each row has at least one 9 and for each row, I want to replace the first instance of it with 90.

Currently, I'm doing:

out = df.mask(df.eq(9) & df.apply(lambda x: ~x.duplicated(), axis=1), 90)

Is there any better/faster way than this?

Expected output:

   col1  col2  col3  col4
0     5     7    12    90
1     0    90     9     1
2    90     9     1     1
3    10     5     2    90
4    90     3     0    18

Constructor:

data = {'col1': [5, 0, 9, 10, 9],
        'col2': [7, 9, 9, 5, 3],
        'col3': [12, 9, 1, 2, 0],
        'col4': [9, 1, 1, 9, 18]}
df = pd.DataFrame(data)

CodePudding user response:

One way using idxmax:

s = df.eq(9).idxmax(axis=1)
s = s.apply(df.columns.get_loc)
df.values[s.index, s.values] = 90

Output:

   col1  col2  col3  col4
0     5     7    12    90
1     0    90     9     1
2    90     9     1     1
3    10     5     2    90
4    90     3     0    18

This is about 2.5x faster than the original code:

%timeit df.mask(df.eq(9) & df.apply(lambda x: ~x.duplicated(), axis=1), 90)
# 2.59 ms ± 80.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%%timeit    
s = df.eq(9).idxmax(axis=1)
s = s.apply(df.columns.get_loc)
df.copy().values[s.index, s.values] = 90 # Note the copy is to keep the df same over the `timeit`

# 1.07 ms ± 31.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

CodePudding user response:

You may check with numpy assign

df.values[df.index, np.argmax(df.values==9,1)] = 90 
df
Out[56]: 
   col1  col2  col3  col4
0     5     7    12    90
1     0    90     9     1
2    90     9     1     1
3    10     5     2    90
4    90     3     0    18

CodePudding user response:

Here is another way:

df.mask(df.eq(9)).T.fillna(90,limit=1).T.fillna(9)

Output:

   col1  col2  col3  col4
0   5.0   7.0  12.0  90.0
1   0.0  90.0   9.0   1.0
2  90.0   9.0   1.0   1.0
3  10.0   5.0   2.0  90.0
4  90.0   3.0   0.0  18.0
  • Related