I have a list of elements extracted from a xml file, they are passed to a pandas dataframe and assigned columns as below.
#dataframe created with a list of lists
df = pd.DataFrame([
['2201 W WILLOW'],
['2201 W WILLOW'],
['ENID'],
['ENID, OK 73073'],
['73073'],
['2201 W WILLOW'],
['2201 W WILLOW'],
['ENID'],
['ENID, OK 73073'],
['73073'],['12345678']]).T
# column
cols= ['AddressLine1', 'AddressLine123', 'City', 'CityStateZip', 'PostalCode',
'AddressLine1', 'AddressLine123', 'City', 'CityStateZip', 'PostalCode','SSN']
# assign columns to pandas data frame
df.columns = cols
Expected output after reshaping would be:
CodePudding user response:
You can use a MultiIndex
:
(df.set_axis(pd.MultiIndex
.from_arrays([df.columns,
df.groupby(df.columns, axis=1)
.cumcount()]),
axis=1)
.loc[0].unstack().add_prefix('value_')
)
Output:
value_0 value_1
AddressLine1 2201 W WILLOW 2201 W WILLOW
AddressLine123 2201 W WILLOW 2201 W WILLOW
City ENID ENID
CityStateZip ENID, OK 73073 ENID, OK 73073
PostalCode 73073 73073
SSN 12345678 NaN
If you have several rows in the input and want to keep them:
(df.set_axis(pd.MultiIndex
.from_arrays([df.columns,
df.groupby(df.columns, axis=1)
.cumcount()]),
axis=1)
.stack(0).add_prefix('value_')
)
Output:
value_0 value_1
0 AddressLine1 2201 W WILLOW 2201 W WILLOW
AddressLine123 2201 W WILLOW 2201 W WILLOW
City ENID ENID
CityStateZip ENID, OK 73073 ENID, OK 73073
PostalCode 73073 73073
SSN 12345678 NaN