Home > Software design >  Modifying Pandas Dataframes
Modifying Pandas Dataframes

Time:11-20

Trying to create a function X(df): replaces the values of the FIRST column of the dataframe as per the following criteria:

  1. If the value is a number between 0 and 0.5 (so 0 <= value <= 0.5), replace this value with the sum of the values of all columns in this row.
  2. If the value is between 1.0 and 2.0 (so 1.0 <= value <= 2.0), replace this value with -99. (if in part 1. the original value is 0.1 and the sum of all columns (in that row) is 1.5, this value will be then replaced by -99 in part 2.)
original df:
|idx|   |A|      |B|             
|0|     |0.4|   1.0
|1|     |0.0|    0.5
|2|     |10.0|   0.0
|3|     |1.5|    -100.0
|4|     |0.1|    0.1
|5|     |0.5|    -10.0


I have this so far:

def X(df):
   for i in df.iloc[:, 0]:
       if (i >= 0) and (i <= 0.5):
           df.iloc[:,0] = df.sum(axis=1)
       elif (i>=1) and (i<=2):
           df.iloc[:,0] = int(-99)
       else:
           continue

   return df


'''
I got: 

     A      B
idx              
0      3.4    1.0
1      1.5    0.5
2     10.0    0.0
3   -298.5 -100.0
4      0.4    0.1
5    -29.5  -10.0


I was expecting:
 A      B
idx             
0     0.5    1.0
1     0.5    0.5
2    10.0    0.0
3     -99 -100.0
4     0.2    0.1
5     -9.5  -10.0

CodePudding user response:

Example

data = {'A': {0: 0.4, 1: 0.0, 2: 10.0, 3: 1.5, 4: 0.1, 5: 0.5},
        'B': {0: 1.0, 1: 0.5, 2: 0.0, 3: -100.0, 4: 0.1, 5: -10.0}}
df = pd.DataFrame(data)

output(df):

    A    B
0   0.4  1.0
1   0.0  0.5
2   10.0 0.0
3   1.5  -100.0
4   0.1  0.1
5   0.5 -10.0



Code

use np.select

import numpy as np
cond1 = (df['A'] >= 0) & (df['A'] <= 0.5)
cond2 = (df['A'] >= 1) & (df['A'] <= 2)
np.select([cond1, cond2], [df.sum(axis=1), -99], df['A'])

result:

array([  1.4,   0.5,  10. , -99. ,   0.2,  -9.5])



Final

make result to column A

df.assign(A=np.select([cond1, cond2], [df.sum(axis=1), -99], df['A']))

desired output:

    A     B
0   1.4   1.0
1   0.5   0.5
2   10.0  0.0
3   -99.0 -100.0
4   0.2   0.1
5   -9.5  -10.0

CodePudding user response:

    for idx, i in df.iterrows():

    if i[0] >= 1.0 and i[0] <= 2.0:
        i[0] = -99

    elif i[0] >= 0 and i[0] <= 0.5:

        if sum(i) >= 1.0 and sum(i) <= 2.0:
            i[0] = -99

        else:
            i[0] = sum(i)
return df
  • Related