How can i search for duplicate columns in a dataframe and then create a new column with same name. the new column is result of 'OR' operator of these columns. Then drop old duplicated columns.
Example:
For that, I tried to create a unique column 'job' that is the result of 'OR' operator of the two 'job' columns in the table bellow.
There is my table look like:
name | job | maried | children | job |
---|---|---|---|---|
John | True | True | True | True |
Peter | True | False | True | True |
Karl | False | True | True | True |
jack | False | False | False | False |
the result that I want is:
name | job | maried | children |
---|---|---|---|
John | True | True | True |
Peter | True | False | True |
Karl | True | True | True |
jack | False | False | False |
I tried to do this (df1 is my table):
df_join = pd.DataFrame()
df1_dulp = pd.DataFrame()
df_tmp = pd.DataFrame()
for column in df1.columns:
df1_dulp = df1.filter(like=str(column))
if df1_dulp.shape[1] >= 2:
for i in range(0, df1_dulp.shape[1]):
df_tmp = df1_dulp.iloc[:,i]
if column in df1_dulp.columns:
df1_dulp.drop(column, axis=1, inplace=True)
df_join = df_join.join(df1_dulp, how = 'left', lsuffix='left', rsuffix='right')
The result is an empty table (df_join).
CodePudding user response:
You can select the boolean columns with select_dtypes
, then aggregate as OR with groupby.any
on columns:
out = (df
.select_dtypes(exclude='bool')
.join(df.select_dtypes('bool')
.groupby(level=0, axis=1, sort=False).any()
)
)
output:
name job maried children
0 John True True True
1 Peter True False True
2 Karl True True True
3 jack False False False