Home > Enterprise >  How to reshape a pandas data frame which has duplicate columns to a required format?
How to reshape a pandas data frame which has duplicate columns to a required format?

Time:11-30

I have a list of elements extracted from a xml file, they are passed to a pandas dataframe and assigned columns as below.

#dataframe created with a list of lists
df = pd.DataFrame([
['2201 W WILLOW'], 
['2201 W WILLOW'], 
['ENID'], 
['ENID, OK 73073'], 
['73073'],
['2201 W WILLOW'], 
['2201 W WILLOW'], 
['ENID'], 
['ENID, OK 73073'], 
['73073'],['12345678']]).T

# column
cols= ['AddressLine1', 'AddressLine123', 'City', 'CityStateZip', 'PostalCode',
'AddressLine1', 'AddressLine123', 'City', 'CityStateZip', 'PostalCode','SSN']

# assign columns to pandas data frame
df.columns = cols

enter image description here

Expected output after reshaping would be:

enter image description here

enter image description here

CodePudding user response:

You can use a MultiIndex:

(df.set_axis(pd.MultiIndex
               .from_arrays([df.columns,
                             df.groupby(df.columns, axis=1)
                               .cumcount()]),
             axis=1)
   .loc[0].unstack().add_prefix('value_')
)

Output:

                       value_0         value_1
AddressLine1     2201 W WILLOW   2201 W WILLOW
AddressLine123   2201 W WILLOW   2201 W WILLOW
City                      ENID            ENID
CityStateZip    ENID, OK 73073  ENID, OK 73073
PostalCode               73073           73073
SSN                   12345678             NaN

If you have several rows in the input and want to keep them:

(df.set_axis(pd.MultiIndex
               .from_arrays([df.columns,
                             df.groupby(df.columns, axis=1)
                               .cumcount()]),
             axis=1)
   .stack(0).add_prefix('value_')
)

Output:

                         value_0         value_1
0 AddressLine1     2201 W WILLOW   2201 W WILLOW
  AddressLine123   2201 W WILLOW   2201 W WILLOW
  City                      ENID            ENID
  CityStateZip    ENID, OK 73073  ENID, OK 73073
  PostalCode               73073           73073
  SSN                   12345678             NaN
  • Related