- How do we divide all numeric values by 10 in the entire pandas dataframe lying between 10 and 100?
conditions:
- Time or any non-numeric column to be ignored.
- The numbers can lie in any row or column.
time |
n1 | n2 | n3 | n4 |
---|---|---|---|---|
11:50 | 1 | 2 | 3 | 40 |
12:50 | 5 | 6 | 70 |
8 |
13:50 | 80 |
7 | 6 | 500 |
Use this code if need be:
import pandas as pd
import numpy as np
time = ['11:50', '12:50', '13:50']
data_1 = {'time': time,
'n1': [1, 5, 80],
'n2': [2, 6 ,7],
'n3': [3, 70 ,6],
'n4': [40, 8, 500],
}
df1 = pd.DataFrame(data = data_1)
df1
Try 1: It doesn't seem to work
j = 0
k = 0
for i in df:
if df[j][k] > 10 and df[j][k] < 100:
df[j][k] = df[j][k] / 10
j = j 1
else:
pass;
k = k 1
Expected Result:
- Since 80, 70, 40 are the numbers lying between 10 and 100, they are all replaced by x/10 in the same dataframe.
- 80 --> 80/10 = 8
- 70 --> 70/10 = 7
- 40 --> 40/10 = 4
- Entire column of time is ignored as it is non-numeric value.
CodePudding user response:
Using DataFrame.applymap
is pretty slow when working with a big data set, it doesn't scale well. You should always look for a vectorized solution if possible.
In this case, you can mask the values between 10 and 100 and perform the conditional replacement using DataFrame.mask
(or DataFrame.where
if you negate the condition).
# select the numeric columns
num_cols = df1.select_dtypes(include="number").columns
# In DataFrame.mask `df` is replaced by the calling DataFrame,
# in this case df = df1[num_cols]
df1[num_cols] = (
df1[num_cols].mask(lambda df: (df > 10) & (df < 100),
lambda df: df // 10)
)
Output:
>>> df1
time n1 n2 n3 n4
0 11:50 1 2 3 4
1 12:50 5 6 7 8
2 13:50 8 7 6 500
Setup:
time = ['11:50', '12:50', '13:50']
data_1 = {'time': time,
'n1': [1, 5, 80],
'n2': [2, 6 ,7],
'n3': [3, 70 ,6],
'n4': [40, 8, 500],
}
df1 = pd.DataFrame(data = data_1)
CodePudding user response:
Does this work:
df1[['n1','n2','n3','n4']].applymap(lambda x : x/10 if 10 < x < 100 else x)
n1 n2 n3 n4
0 1.0 2 3.0 4.0
1 5.0 6 7.0 8.0
2 8.0 7 6.0 500.0
CodePudding user response:
You can select the columns which have numeric datatypes, use .applymap()
to perform the division operation, and then reassign back to the original dataframe. Notably, this doesn't require hardcoding the columns you want to transform in advance:
numerics = df1.select_dtypes(include="number")
numerics = numerics.applymap(lambda x: x // 10 if 10 < x < 100 else x)
df1[numerics.columns] = numerics
This outputs:
time n1 n2 n3 n4
0 11:50 1 2 3 4
1 12:50 5 6 7 8
2 13:50 8 7 6 500
CodePudding user response:
Try the following
def repl(df, cols):
for col in cols:
df[col] = df[col].apply(lambda x: x//10 if x >= 10 and x <= 100 else x)
return df
new_df = repl(df1, ['n1', 'n2', 'n3', 'n4'])
new_df
Output:
time n1 n2 n3 n4
0 11:50 1 2 3 4
1 12:50 5 6 7 8
2 13:50 8 7 6 500