Implement specific function in Python-CodePudding

I am trying to implement my own function. Below you can see my code and data

import pandas as pd
import numpy as np

data = {'type_sale':[100,0,24,0,0,20,0,0,0,0],
         'salary':[0,0,24,80,20,20,60,20,20,20],
        }
df1 = pd.DataFrame(data, columns = ['type_sale',
                                      'salary',])



def cal_fun(type_sale,salary):       
    if type_sale > 1:
        new_sale = 0
    elif (type_sale==0) and (salary >1):
         new_sale = (np.random.choice, 10, p=[0.5,0.05]))/2 ###<-This line here  
    return  (new_sale)

df1['new_sale']=cal_fun(type_sale,salary)

So with this function, I want to randomly select 50 percent of rows (with np.random) in the salary column. These randomly selected rows need to have zero at the same time in the column type_sale, and after that, I want to divide these values by 2.

I tried with the above function, but I am not sure that I made this thing properly. So can anybody help me with how to solve this problem?

In the end, I expect to have the table as the table is shown below.

Your ideas, please implement in the above format of function

CodePudding user response：

To get a 50% choice you only need to choose 1 of 2 options. If I understand your issue then:

import pandas as pd
import random

data = {'type_sale':[100,0,24,0,0,20,0,0,0,0],
         'salary':[0,0,24,80,20,20,60,20,20,20],
        }
df1 = pd.DataFrame(data, columns = ['type_sale',
                                      'salary',])
def cal_fun(row):
    t = row['type_sale']
    s = row['salary'
    if (t==0) and (s > 0):
        select = random.choice([0, 1])
        if select:
            return s/2
        else:
            return s
    else:
        return 0

df1['new_sale']=df1.apply(lambda x: cal_fun(x), axis = 1)

print(df1)

which gives:

   type_sale  salary  new_sale
0        100       0       0.0
1          0       0       0.0
2         24      24       0.0
3          0      80      40.0
4          0      20      20.0
5         20      20       0.0
6          0      60      30.0
7          0      20      20.0
8          0      20      20.0
9          0      20      10.0

CodePudding user response：

import pandas as pd
import numpy as np

data = {'type_sale':[100,0,24,0,0,20,0,0,0,0],
     'salary':[0,0,24,80,20,20,60,20,20,20],
    }
df1 = pd.DataFrame(data, columns = ['type_sale',
                                  'salary',])



def cal_fun(type_sale,salary):

    # get random 50 % row from type_sale column 

    random_indexes = np.random.randint(0,len(df1),int(len(df1["type_sale"])/2))

    random_rows = df1.iloc[random_indexes][type_sale == 0].index # get index which is type_sale == 0

    new_sale = salary.copy()
    new_sale[random_rows] /= 2 
    return new_sale

df1['new_sale']=cal_fun(df1["type_sale"],df1["salary"])
print(df1)

If I totaled half the number of rows, we chose random rows and we extracted the ones with type_sale == 0 from these rows and using these we extracted the salary by dividing it by 2 and created the new_salary column. I understood the problem in this way, I may have misunderstood the problem, if it is to get 5 random indexes with type_sale == 0 If you want, update the following lines of code:

random_indexes = np.random.choice(type_sale.index,5)
df1['new_sale']=cal_fun(df1[df1["type_sale"] == 0],df1["salary"])

You can also use apply function

def cal_fun(row):
    if row["type_sale"] == 0:
        row["new_salary"] /= 2 
    return row
df1["new_salary"] = df1["salary"].copy()
random_indexes = np.random.choice(df1[df1["type_sale"] == 0].index,5)
df1.iloc[random_indexes] = df1.iloc[random_indexes].apply(cal_fun,axis = 1)
print(df1)