Most elegant way to transform this type of table?-CodePudding

I have a dataframe that looks something like this:

id   name   last   attribute_1_name  attribute_1_rating  attribute_2_name  attribute_2_rating
 1   Linda  Smith  Age                23                  Hair              Brown
 3   Brian  Lin    Hair               Black               Job               Barista

Essentially I'd like to transform this table to look like so:

id   name   last   attribute_name  attribute_rating
 1   Linda  Smith  Age                23
 1   Linda  Smith  Hair               Brown
 3   Brian  Lin    Hair               Black
 3   Brian  Lin    Job                Barista

What's the most elegant and efficient way to perform this transformation in Python? Assuming there are many more rows and the attribute numbers go up to 13.

CodePudding user response：

Assuming attribute columns are named coherently, you can do this:

result = pd.DataFrame()
for i in [1, 2]:
    attribute_name_col = f'attribute_{i}_name'
    attribute_rating_col = f'attribute_{i}_rating'
    melted = pd.melt(
        df, 
        id_vars=['id', 'name', 'last', attribute_name_col], 
        value_vars=[attribute_rating_col]
    )
    melted = melted.rename(
        columns={attribute_name_col: 'attribute_name', 
                 'value': 'attribute_rating'}
    )
    melted = melted.drop('variable', axis=1)
    result = pd.concat([result, melted])

where df is your original dataframe. Then printing result gives

id  name    last      attribute_name    attribute_rating
1   Linda   Smith     Age               23
3   Brian   Lin       Hair              Black
1   Linda   Smith     Hair              Brown
3   Brian   Lin       Job               Barista

CodePudding user response：

This isolates attributes/ratings one by one, renaming the columns, and finally concatenating the list of dataframes for each name:

N_ATTRIBUTES = 2 # change using the desired number of outputs
df1 = \
pd.concat(\
    [df_name.loc[:,list(compress(list(df_name.columns), [x in ["name", "last", 
    f'attribute_{i}_name', f'attribute_{i}_rating'] for x in df_name.columns]))]\
    .rename(columns=dict([[f'attribute_{i}_name',"attribute_name"], 
    [f'attribute_{i}_rating', "attribute_rating"]])) for i in range(1,N_ATTRIBUTES 1)]
)