I have CSV files like following
ID Result1 Result2 Result3
0 1 1 0
1 0 1 0
2 0 0 0
3 0 0 0
4 1 0 0
5 0 0 0
I want to replace Result1, Result2, and Result3 columns by all zero if the sum of their column values is greater than 1. I tried this script
new = []
df1 = pd.read_csv(r"C:\Users\test.csv")
df= df1.loc[:, df1.columns != 'ID']
l = list(df.columns.values)
for l in l:
if sum(df[l]) > 1:
df[l].loc[(df[l] > 0)] = 0
new.append(l)
df2 = df[new]
When I do this operation Result3 column is dropped since it has all zero values. How do I do this operation only on the columns which satisfy the condition without affecting another column which didn't satisfy the condition?
CodePudding user response:
Pandas supports numpy-like slicing, so one way would be to use the .loc
method of pandas.DataFrame
:
df1.loc[:, df1.sum() > 1] = 0
where df1.sum() > 1
returns a list of columns that satisfy your condition.
CodePudding user response:
data = {'Result1': {0: 1, 1: 0, 2: 0, 3: 0, 4: 1, 5: 0},
'Result2': {0: 1, 1: 1, 2: 0, 3: 0, 4: 0, 5: 0},
'Result3': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0}}
df = pd.DataFrame.from_dict(data)
for col in df.columns:
if df[col].sum() > 1:
df[col] = 0
print(df)
gives
Result1 Result2 Result3
0 0 0 0
1 0 0 0
2 0 0 0
3 0 0 0
4 0 0 0
5 0 0 0
In your example dataframe, this makes all columns zero. To see that this is not always the case, put a 1 in the last column and verify that it survives the operation.