Home > Software design >  How to clip specific columns of a dataframe to a specified range?
How to clip specific columns of a dataframe to a specified range?

Time:06-08

I need to clip the three columns of c1, c2, c3 to [-1, 1] in df.

That is, values greater than 1. are assigned 1., and values less than -1. are assigned -1..

My code is as follows:

import numpy as np
import pandas as pd

rand = np.random.default_rng(seed=0)

df = pd.DataFrame(rand.uniform(-2, 2, 50).reshape(10, 5), columns=['a', 'b', 'c1', 'c2', 'c3'])
print(df)
          a         b        c1        c2        c3
0  0.547847 -0.920853 -1.836106 -1.933889  1.253081
1  1.651022  0.426543  0.917986  0.174500  1.740290
2  1.263414 -1.989046  1.429617 -1.865658  0.918622
3 -1.297378  1.452716  0.165845 -0.801152 -0.309251
4 -1.886721 -1.502867  0.682498  0.588758  0.461540
5 -0.465290  1.988840  1.923341  0.742168  0.601837
6  0.753787 -0.444314 -1.459614  0.885953  0.101417
7 -0.759032 -0.056659  1.557951  1.736174 -0.568819
8  0.286119 -0.712522  0.377200 -0.648355 -0.433524
9  1.561097 -1.091370  0.492749 -1.663939  1.330577

What I want to achieve:

          a         b        c1        c2        c3
0  0.547847 -0.920853 -1.       -1.        1.      
1  1.651022  0.426543  0.917986  0.174500  1.      
2  1.263414 -1.989046  1.       -1.        0.918622
3 -1.297378  1.452716  0.165845 -0.801152 -0.309251
4 -1.886721 -1.502867  0.682498  0.588758  0.461540
5 -0.465290  1.988840  1.        0.742168  0.601837
6  0.753787 -0.444314 -1.        0.885953  0.101417
7 -0.759032 -0.056659  1.        1.       -0.568819
8  0.286119 -0.712522  0.377200 -0.648355 -0.433524
9  1.561097 -1.091370  0.492749 -1.        1.      

How can I do this?

CodePudding user response:

Just use pandas build in clip

df[['c1','c2','c3']] = df[['c1','c2','c3']].clip(-1,1)

CodePudding user response:

While @ti7 suggestion comes pretty close, it doesn't quite handle the case presented. This is how I would approach solving this problem.

def clip(arr, clip_left, clip_right):
    for id, val in enumerate(arr):
        if val > clip_right:
            arr[id] = 1.0
        elif val < clip_left:
            arr[id] = -1.0
    
    return arr

then running

cols = ['c1', 'c2', 'c3']
for c in cols:
    df[c] = clip(df[c], -1.0,  1.0)
print(df)  

Yields:

       a                b          c1          c2        c3
0   0.547847    -0.920853   -1.000000   -1.000000   1.000000
1   1.651022    0.426543    0.917986    0.174500    1.000000
2   1.263414    -1.989046   1.000000    -1.000000   0.918622
3   -1.297378   1.452716    0.165845    -0.801152   -0.309251
4   -1.886721   -1.502867   0.682498    0.588758    0.461540
5   -0.465290   1.988840    1.000000    0.742168    0.601837
6   0.753787    -0.444314   -1.000000   0.885953    0.101417
7   -0.759032   -0.056659   1.000000    1.000000    -0.568819
8   0.286119    -0.712522   0.377200    -0.648355   -0.433524
9   1.561097    -1.091370   0.492749    -1.000000   1.000000

CodePudding user response:

import pandas as pd
import numpy as np

df = pd.read_csv('sample_stack.csv')
for i in ['c1','c2','c3']:
    df[i] = np.where(df[i]>1,1.,df[i])
    df[i] = np.where(df[i]<-1,-1.,df[i])

df[['c1','c2','c3']] = df[['c1','c2','c3']].astype(str)
for i in ['c1','c2','c3']:
    df[i] = np.where(df[i]=='1.0','1.',df[i])
    df[i] = np.where(df[i]=='-1.0','-1.',df[i])
df

Though this solves your problem temporarily but if you need to do mathematical manipulation then you need to convert the data type to required ones and it will again change to decimal format.

Your objective could have helped better to provide solutions.

  • Related