Merging two identically-named columns in a dataframe-CodePudding

I have a dataframe that looks like this:

df = pd.DataFrame({'a':[1,0,1],'b':[0,1,0],'b1':[1,0,0],'c':[0,1,1]})
df.columns = ['a','b','b','c']

>>> df
   a  b  b  c
0  1  0  1  0
1  0  1  0  1
2  1  0  0  1

I want to merge those two different b columns together, like this:

I understand that I could use | (OR) in a bitwise context to combine them, e.g. with a and c:

>>> df['a'] | df['c']
0    1
1    1
2    1
dtype: int64

But I'm having trouble selecting the two individual b columns, because of this:

>>> df['b']
   b  b
0  0  1
1  1  0
2  0  0

>>> df['b']['b']
   b  b
0  0  1
1  1  0
2  0  0

>>> df['b']['b']['b']['b']['b']['b']['b']['b']['b']['b']['b']['b']['b']['b']['b']['b']['b']['b']['b']['b']['b']['b']['b']['b']['b']['b']['b']
   b  b
0  0  1
1  1  0
2  0  0

CodePudding user response：

Try with sum and clip:

df["b"] = df["b"].sum(axis=1).clip(0, 1)

#remove duplicate column
df = df.loc[:, ~df.columns.duplicated()]

CodePudding user response：

Assuming that you have multiple groups of repeated columns, you can apply the same logic of not_speshal's solution to each group using DataFrame.groupby.

# group the columns (axis=1) by their labels (level=0) and apply the logic to each group
df = df.groupby(level=0, axis=1).sum().clip(0, 1)

CodePudding user response：

Beside the answer suggested by not_speshal, you could also access the columns by index as follows:

df.iloc[:, 1] | df.iloc[:, 2]