Convert the -ve values into 1 and reduce those values from ve values of same variable in pandas-CodePudding

I have a dataframe

df_in = pd.DataFrame([["A",-2],["B",23],["A",-4],["A",14],["B",12],["A",34],["B",-4],["C",-1],["A",-5],["B",21],["C",4],["B",-6]], columns=['var', 'val'])

var val
A   -2
B   23
A   -4
A   14
B   12
A   34
B   -4
C   -1
A   -5
B   21
C    4
B   -6

I want to convert all -ve values to 1. find the difference between the value and 1, sum it at var level. Divide the sum by no. of ve value of var and reduce that value from the ve values of var. Ex: var A has 3 -ve value and 2 ve values. Covert all -ve values to 1 and find the difference like diff between -2 and 1 is 3, -4 and 1 is 5 -5 and 1 is 6. Sum 3 5 6=14. A has 2 values. So divide 14 by 2 which is 7. Now reduce 7 from ve values(23,34) of A. Similarly repeat for other var also either by doing groupby.

Expected Output is:

df_out = pd.DataFrame([["A",1],["B",19],["A",1],["A",7],["B",8],["A",27],["B",1],["C",1],["A",1],["B",17],["C",2],["B",1]], columns=['var', 'val'])

var val
A   1
B   19
A   1
A   7
B   8
A   27
B   1
C   1
A   1
B   17
C   2
B   1

How to do it?

CodePudding user response：

Use:

#Trues for positive
m = df_in['val'].gt(0)
#substract 1 form right side and aggregate negative values
neg = df_in.loc[~m, 'val'].rsub(1).groupby(df_in['var']).sum()
#count positive values
pos = df_in.loc[m, 'var'].value_counts()
#divide both
diff = neg.div(pos)
#mapping difference with subtract `val` for positive else set 1
df_in['val'] = np.where(m, df_in['val'].sub(df_in['var'].map(diff)), 1)
print (df_in)
   var   val
0    A   1.0
1    B  19.0
2    A   1.0
3    A   7.0
4    B   8.0
5    A  27.0
6    B   1.0
7    C   1.0
8    A   1.0
9    B  17.0
10   C   2.0
11   B   1.0

CodePudding user response：

def modify(avg, num):
    if num < 0:
        return 1
    if num > 0:
        return num-avg


# create a new df
df_new = pd.DataFrame(columns=["var", "val"])

for var in df_in.groupby("var"):
    cur_df = var[1]
    # calculate -ve count
    neg_count = cur_df[cur_df["val"] < 0]["val"].count()
    # calculate -ve sum
    neg_sum = cur_df[cur_df["val"] < 0]["val"].sum()
    # calculate  ve count
    pos_count = cur_df[cur_df["val"] > 0]["val"].count()
    # calculate avg
    avg = (neg_count - neg_sum)/pos_count
    cur_df["val"] = cur_df["val"].apply(lambda x: modify(avg, x))
    df_new = pd.concat([df_new, cur_df], axis=0)

df_new = df_new.sort_index()
print(df_new)

   var   val
0    A   1.0
1    B  19.0
2    A   1.0
3    A   7.0
4    B   8.0
5    A  27.0
6    B   1.0
7    C   1.0
8    A   1.0
9    B  17.0
10   C   2.0
11   B   1.0