Keep first instance of duplicate column name, unless empty then keep second instance of column-CodePudding

I have a dataframe with multiple columns, two of these columns have the same name ('mobilephone'), some values are empty, and some aren't but there will always be one of the two columns populated with a phone number:

             mobilephone          mobilephone 
0              999000111            999000111       
1              999000222               
2                                   999000333    
3              999000444            999000444

How would I keep only one of these columns but populate the empty values in the first column with the values in the second column?

CodePudding user response：

First, find the index of the second duplicate column. Then change the name of this column and use .loc[] to fill in the gaps:

col_name = 'mobilephone'
index_second_column = np.where(df.columns.duplicated())[0][0]
df.columns.values[index_second_column]  = '_1'

df.loc[df[col_name].isnull(), colname] = df[col_name   '_1']
df.drop(columns = col_name   '_1', inplace=True)

CodePudding user response：

Your Problem can easily solved using below code:

# renaming the columns
df.columns = ['mobilephone1', 'mobilephone2']
# filling blank cell with second column data
df['mobilephone1']=df['mobilephone1'].fillna(df['mobilephone2'])
# if you want you canrenam the columns again
df.columns = ['mobilephone', 'mobilephone']

Let me know if this helps you