Replace zero values on all columns if sum of that columns is greater than 1?-CodePudding

I have CSV files like following

ID Result1  Result2  Result3

0  1        1        0

1  0        1        0

2  0        0        0

3  0        0        0

4  1        0        0

5  0        0        0

I want to replace Result1, Result2, and Result3 columns by all zero if the sum of their column values is greater than 1. I tried this script

new = []
df1 =  pd.read_csv(r"C:\Users\test.csv")
df= df1.loc[:, df1.columns != 'ID']
l = list(df.columns.values)

for l in l:
    if sum(df[l]) > 1:
        df[l].loc[(df[l] > 0)] = 0
        new.append(l)
df2 = df[new]

When I do this operation Result3 column is dropped since it has all zero values. How do I do this operation only on the columns which satisfy the condition without affecting another column which didn't satisfy the condition?

CodePudding user response：

Pandas supports numpy-like slicing, so one way would be to use the .loc method of pandas.DataFrame:

df1.loc[:, df1.sum() > 1] = 0

where df1.sum() > 1 returns a list of columns that satisfy your condition.

CodePudding user response：

data = {'Result1': {0: 1, 1: 0, 2: 0, 3: 0, 4: 1, 5: 0},
        'Result2': {0: 1, 1: 1, 2: 0, 3: 0, 4: 0, 5: 0},
        'Result3': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0}}

df = pd.DataFrame.from_dict(data)

for col in df.columns:
    if df[col].sum() > 1:
        df[col] = 0

print(df)

gives

   Result1  Result2  Result3
0        0        0        0
1        0        0        0
2        0        0        0
3        0        0        0
4        0        0        0
5        0        0        0

In your example dataframe, this makes all columns zero. To see that this is not always the case, put a 1 in the last column and verify that it survives the operation.