I have a dataframe that looks something like this:
id name last attribute_1_name attribute_1_rating attribute_2_name attribute_2_rating
1 Linda Smith Age 23 Hair Brown
3 Brian Lin Hair Black Job Barista
Essentially I'd like to transform this table to look like so:
id name last attribute_name attribute_rating
1 Linda Smith Age 23
1 Linda Smith Hair Brown
3 Brian Lin Hair Black
3 Brian Lin Job Barista
What's the most elegant and efficient way to perform this transformation in Python? Assuming there are many more rows and the attribute numbers go up to 13.
CodePudding user response:
Assuming attribute columns are named coherently, you can do this:
result = pd.DataFrame()
for i in [1, 2]:
attribute_name_col = f'attribute_{i}_name'
attribute_rating_col = f'attribute_{i}_rating'
melted = pd.melt(
df,
id_vars=['id', 'name', 'last', attribute_name_col],
value_vars=[attribute_rating_col]
)
melted = melted.rename(
columns={attribute_name_col: 'attribute_name',
'value': 'attribute_rating'}
)
melted = melted.drop('variable', axis=1)
result = pd.concat([result, melted])
where df
is your original dataframe. Then printing result
gives
id name last attribute_name attribute_rating
1 Linda Smith Age 23
3 Brian Lin Hair Black
1 Linda Smith Hair Brown
3 Brian Lin Job Barista
CodePudding user response:
This isolates attributes/ratings one by one, renaming the columns, and finally concatenating the list of dataframes for each name:
N_ATTRIBUTES = 2 # change using the desired number of outputs
df1 = \
pd.concat(\
[df_name.loc[:,list(compress(list(df_name.columns), [x in ["name", "last",
f'attribute_{i}_name', f'attribute_{i}_rating'] for x in df_name.columns]))]\
.rename(columns=dict([[f'attribute_{i}_name',"attribute_name"],
[f'attribute_{i}_rating', "attribute_rating"]])) for i in range(1,N_ATTRIBUTES 1)]
)