Home > Mobile >  Combine dataframe column names and row values to a single string
Combine dataframe column names and row values to a single string

Time:04-28

I want to combine some informations from a pandas dataframe to a string with html-syntax.

This is a demo dataframe for the problem demoframe

For the target string I want to combine some columns with their values, separated by the html-tag <br>. So, if the selected columns are vehicle, owner and mileage the result for the first index should be the string

vehicle: Ford<br>owner: Sandy<br>mileage: 53647

I made up a solution but I think there must be an easier way to do this. Here is what I did:

import pandas as pd

# %% create some data

demo = {'vehicle': ['Ford', 'VW', 'Mercedes', 'Dodge'],
        'owner': ['Sandy', 'Brutus', 'Driver5', 'Al'],
        'mileage': [53647, 12564, 24852, 1000000],
        'some random ratio': [0.3, 1.8, 66.6, 18.0]}
df_demo = pd.DataFrame(demo)

# %% create tooltip string

# select columns
tt_cols = ['vehicle','owner','mileage']
# creates tuple of columns and values for each row
df_demo['tooltip'] = df_demo[tt_cols].apply(lambda row: list(zip(tt_cols, row.values.astype(str))), axis=1)
# strings from tuples
df_demo['tooltip'] = df_demo['tooltip'].apply(lambda val: [': '.join(x) for x in val])
# list of strings to string with separator
df_demo['tooltip'] = df_demo['tooltip'].apply(lambda val: '<br>'.join(val))

This works fine and creates a new column tooltip with the string for each row. But, in my opinion, it is not very "elegant" to iterate three times through the whole dataframe to create this string.

I know that I can combine/nest the last lines, but I think this is unreadable:

df_demo['tooltip'] = df_demo[tt_cols].apply(lambda row: '<br>'.join([': '.join(x) for x in list(zip(tt_cols, row.values.astype(str)))]), axis=1)

Any suggestions how to enhance this, to make it shorter or more readable ?

Thanks for your help!

CodePudding user response:

You can use to_dict('records') to convert the rows into a list of dicts, and then use list comprehension to format them:

df['tooltip'] = ['<br>'.join(f'{k}: {v}' for k, v in i.items()) for i in df_demo[tt_cols].to_dict('records')]

Output:

>>> df
    vehicle    owner  mileage  some random ratio                                                tooltip
0  Ford      Sandy    53647    0.3                vehicle: Ford<br>owner: Sandy<br>mileage: 53647      
1  VW        Brutus   12564    1.8                vehicle: VW<br>owner: Brutus<br>mileage: 12564       
2  Mercedes  Driver5  24852    66.6               vehicle: Mercedes<br>owner: Driver5<br>mileage: 24852
3  Dodge     Al       1000000  18.0               vehicle: Dodge<br>owner: Al<br>mileage: 1000000
  • Related