Multiple similar columns with similar values-CodePudding

The dataframe looks like:

name	education	education 2	education 3
name 1	NaN	some college	NaN
name 2	NaN	NaN	graduate degree
name 3	high school	NaN	NaN

I just want to keep one education column. I tried to use the conditional statement and compared to each other, I got nothing but error though. I also looked through the merge solution, but in vain. Does anyone know how to deal with it using Python or pandas? Thank you in advance.

name	education
name 1	some college
name 2	graduate degree
name 3	high school

CodePudding user response：

use bfill to fill the empty (NaN) values

df.bfill(axis=1).drop(columns=['education 2','education 3'])

    name    education
0   name 1  some college
1   name 2  graduate degree
2   name 3  high school

if there are other columns in between then choose the columns to apply bfill In essence, if you have multiple columns for education that you need to consolidate under a single column then choose the columns to which you apply the bfill. subsequently, you can delete those columns from which you back filled.

df[['education','education 2','education 3']].bfill(axis=1).drop(columns=['education 2','education 3'])

CodePudding user response：

One day I hope they'll have better functions for String type rows, rather than the limited support for columns currently available:

df['education'] = (df.filter(like='education') # Filters to only Education columns.
                     .T                        # Transposes
                     .convert_dtypes()         # Converts to pandas dtypes, still somewhat in beta.
                     .max()                    # Gets the max value from the column, which will be the not-null one.
                  )
df = df[['name', 'education']]
print(df)

Output:

     name        education
0  name 1     some college
1  name 2  graduate degree
2  name 3      high school

Looping this wouldn't be too hard e.g.:

cols = ['education', 'age', 'income']
for col in cols:
    df[col] = df.filter(like=col).bfill(axis=1)[col]

df = df[['name']   cols]

CodePudding user response：

You can use df.fillna to do so.

df['combine'] = df[['education','education2','education3']].fillna('').sum(axis=1)

df
    name     education    education2       education3          combine
0  name1          NaN  some college              NaN     some college
1  name2          NaN           NaN  graduate degree  graduate degree
2  name3  high school           NaN              NaN      high school

If you have a lot of columns to combine, you can try this.

df['combine'] = df[df.columns[1:]].fillna('').sum(axis=1)