Data Set
result = [['Nikolas', 'Webber', 63, 'Male', 'Options', 'Option - 1'],
['Sanaa', 'Kirkpat', 53, 'Male', 'Options', 'Option - 3'],
['Laaibah', 'Andrson', 52, 'Male', 'Options', 'Option - 1'],
['Beck', 'Villanu', 61, 'Male', 'Options', 'Option - 2'],
['Blair', 'Hook', 67, 'Male', 'Options', 'Option - 1'],
['Kavan', 'Hodges', 47, 'Male', 'Options', 'Option - 3'],
['Kajus', 'Oco', 54, 'Male', 'Options', 'Option - 1']]
pandas DataFrame created
df = pandas.DataFrame(result, columns=['First Name', 'Last Name', 'Age', 'Gender', 'field_label', 'field_value'])
Values in column field_label has to be transposed as column header replacing column header name - field_value to Options
data = df.pivot_table(index=['First Name', 'Last Name', 'Age', 'Gender'], columns='field_label', values='field_value', fill_value='', aggfunc='first')
data = data.reset_index().rename_axis(None, axis=1)
Tried serveral other ways to achieve this using pivot_table
and aggfunc='first'
but some of the records are getting omitted when result set is too large to create dataframe.
Input
First Name Last Name Age Gender field_label field_value
Nikolas Webber 63 Male Options Option - 1
Sanaa Kirkpat 53 Male Options Option - 3
Laaibah Andrson 52 Male Options Option - 1
Beck Villanu 61 Male Options Option - 2
Blair Hook 67 Male Options Option - 1
Kavan Hodges 47 Male Options Option - 3
Kajus Oco 54 Male Options Option - 1
Any pointer to get below Output using pandas pivot_table or any other way would be really appreciated.
Output
First Name Last Name Age Gender Options
Nikolas Webber 63 Male Option - 1
Sanaa Kirkpat 53 Male Option - 3
Laaibah Andrson 52 Male Option - 1
Beck Villanu 61 Male Option - 2
Blair Hook 67 Male Option - 1
Kavan Hodges 47 Male Option - 3
Kajus Oco 54 Male Option - 1
CodePudding user response:
I think code below will help you:
# create your processing function
def handle_function(row: pd.Series) -> pd.Series:
"""
What you want to do here...
"""
# get label and value
label = row["field_label"]
value = row["field_value"]
# create column for label if it exists no create
row[label] = value
# return new row
return row
# apply on your dataframe
df.apply(handle_function, axis=1)
# remove columns for label and values
df.drop(["field_label", "field_value", axis=1, inplace=True]