Home > Net >  Replace zero values on all columns if sum of that columns is greater than 1?
Replace zero values on all columns if sum of that columns is greater than 1?

Time:06-21

I have CSV files like following

ID Result1  Result2  Result3

0  1        1        0

1  0        1        0

2  0        0        0

3  0        0        0

4  1        0        0

5  0        0        0

I want to replace Result1, Result2, and Result3 columns by all zero if the sum of their column values is greater than 1. I tried this script

new = []
df1 =  pd.read_csv(r"C:\Users\test.csv")
df= df1.loc[:, df1.columns != 'ID']
l = list(df.columns.values)

for l in l:
    if sum(df[l]) > 1:
        df[l].loc[(df[l] > 0)] = 0
        new.append(l)
df2 = df[new]

When I do this operation Result3 column is dropped since it has all zero values. How do I do this operation only on the columns which satisfy the condition without affecting another column which didn't satisfy the condition?

CodePudding user response:

Pandas supports numpy-like slicing, so one way would be to use the .loc method of pandas.DataFrame:

df1.loc[:, df1.sum() > 1] = 0

where df1.sum() > 1 returns a list of columns that satisfy your condition.

CodePudding user response:

data = {'Result1': {0: 1, 1: 0, 2: 0, 3: 0, 4: 1, 5: 0},
        'Result2': {0: 1, 1: 1, 2: 0, 3: 0, 4: 0, 5: 0},
        'Result3': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0}}

df = pd.DataFrame.from_dict(data)

for col in df.columns:
    if df[col].sum() > 1:
        df[col] = 0

print(df)

gives

   Result1  Result2  Result3
0        0        0        0
1        0        0        0
2        0        0        0
3        0        0        0
4        0        0        0
5        0        0        0

In your example dataframe, this makes all columns zero. To see that this is not always the case, put a 1 in the last column and verify that it survives the operation.

  • Related