Home > OS >  Calculate sum of columns of same name pandas
Calculate sum of columns of same name pandas

Time:08-11

How can i search for duplicate columns in a dataframe and then create a new column with same name. the new column is result of 'OR' operator of these columns. Then drop old duplicated columns.

Example:

For that, I tried to create a unique column 'job' that is the result of 'OR' operator of the two 'job' columns in the table bellow.

There is my table look like:

name job maried children job
John True True True True
Peter True False True True
Karl False True True True
jack False False False False

the result that I want is:

name job maried children
John True True True
Peter True False True
Karl True True True
jack False False False

I tried to do this (df1 is my table):

df_join = pd.DataFrame()
df1_dulp = pd.DataFrame()
df_tmp = pd.DataFrame()

for column in df1.columns:
    df1_dulp = df1.filter(like=str(column))
    if df1_dulp.shape[1] >= 2:
        for i in range(0, df1_dulp.shape[1]):
            df_tmp  = df1_dulp.iloc[:,i]
        if column in df1_dulp.columns:
            df1_dulp.drop(column, axis=1, inplace=True)
        df_join = df_join.join(df1_dulp, how = 'left', lsuffix='left', rsuffix='right')

The result is an empty table (df_join).

CodePudding user response:

You can select the boolean columns with select_dtypes, then aggregate as OR with groupby.any on columns:

out = (df
   .select_dtypes(exclude='bool')
   .join(df.select_dtypes('bool')
           .groupby(level=0, axis=1, sort=False).any()
        )
)

output:

    name    job  maried  children
0   John   True    True      True
1  Peter   True   False      True
2   Karl   True    True      True
3   jack  False   False     False
  • Related