Home > Software engineering >  iterating over row and column and replace values based on condition
iterating over row and column and replace values based on condition

Time:06-26

  1. How do we divide all numeric values by 10 in the entire pandas dataframe lying between 10 and 100?

conditions:

  1. Time or any non-numeric column to be ignored.
  2. The numbers can lie in any row or column.
time n1 n2 n3 n4
11:50 1 2 3 40
12:50 5 6 70 8
13:50 80 7 6 500

Use this code if need be:


import pandas as pd
import numpy as np

time = ['11:50', '12:50', '13:50']
data_1 = {'time': time,
          'n1': [1, 5, 80],
          'n2': [2, 6 ,7],
          'n3': [3, 70 ,6],
          'n4': [40, 8, 500],
        }

df1 = pd.DataFrame(data = data_1)
df1

Try 1: It doesn't seem to work

j = 0
k = 0
for i in df:
    if df[j][k] > 10 and df[j][k] < 100:
        df[j][k] = df[j][k] / 10
        j = j   1
    else:
        pass;
    k = k   1

Expected Result:

  1. Since 80, 70, 40 are the numbers lying between 10 and 100, they are all replaced by x/10 in the same dataframe.
  • 80 --> 80/10 = 8
  • 70 --> 70/10 = 7
  • 40 --> 40/10 = 4
  1. Entire column of time is ignored as it is non-numeric value.

CodePudding user response:

Using DataFrame.applymap is pretty slow when working with a big data set, it doesn't scale well. You should always look for a vectorized solution if possible.

In this case, you can mask the values between 10 and 100 and perform the conditional replacement using DataFrame.mask (or DataFrame.where if you negate the condition).

# select the numeric columns
num_cols = df1.select_dtypes(include="number").columns

# In DataFrame.mask `df` is replaced by the calling DataFrame, 
# in this case df = df1[num_cols]
df1[num_cols] = (
    df1[num_cols].mask(lambda df: (df > 10) & (df < 100), 
                       lambda df: df // 10)
)

Output:

>>> df1

    time  n1  n2  n3   n4
0  11:50   1   2   3    4
1  12:50   5   6   7    8
2  13:50   8   7   6  500

Setup:

time = ['11:50', '12:50', '13:50']
data_1 = {'time': time,
          'n1': [1, 5, 80],
          'n2': [2, 6 ,7],
          'n3': [3, 70 ,6],
          'n4': [40, 8, 500],
        }

df1 = pd.DataFrame(data = data_1)

CodePudding user response:

Does this work:

df1[['n1','n2','n3','n4']].applymap(lambda x : x/10 if 10 < x < 100 else x)
    n1  n2  n3  n4
0   1.0 2   3.0 4.0
1   5.0 6   7.0 8.0
2   8.0 7   6.0 500.0

CodePudding user response:

You can select the columns which have numeric datatypes, use .applymap() to perform the division operation, and then reassign back to the original dataframe. Notably, this doesn't require hardcoding the columns you want to transform in advance:

numerics = df1.select_dtypes(include="number")
numerics = numerics.applymap(lambda x: x // 10 if 10 < x < 100 else x)
df1[numerics.columns] = numerics

This outputs:

    time  n1  n2  n3   n4
0  11:50   1   2   3    4
1  12:50   5   6   7    8
2  13:50   8   7   6  500

CodePudding user response:

Try the following

def repl(df, cols):
    for col in cols:
        df[col] = df[col].apply(lambda x: x//10 if x >= 10 and x <= 100 else x)
    return df

new_df = repl(df1, ['n1', 'n2', 'n3', 'n4'])
new_df

Output:

   time   n1    n2  n3  n4
0   11:50   1   2   3   4
1   12:50   5   6   7   8
2   13:50   8   7   6   500
  • Related