Home > Mobile >  pandas single column value to multiple column headers with formatted values
pandas single column value to multiple column headers with formatted values

Time:03-26

I am trying to convert a single column extra into three new headers based on the string value of extra formatted as <column name>: <column value(s)>, ..., <column name>: <column value(s)> where column name is the new column and column value(s) can be an arbitrary column value such as list, float or string.

I am working with the following dataframe:

import pandas as pd
 
df = pd.DataFrame(
    {
        "subject": [1,1],
        "extra": ["category: app, datasets: [\"X\", \"Y\"], acc: [0.8, 0.9]",
                  "category: dev, datasets: [\"Z\", \"Y\"], acc: [0.7, 0.95]"],
    }
)

desired output:

   subject category datasets          acc
0        1      app   [X, Y]   [0.8, 0.9]
1        1      dev   [Z, Y]  [0.7, 0.95]

and then df.explode(["acc", "datasets"]) will give the final desired result

   subject category datasets   acc
0        1      app        X   0.8
0        1      app        Y   0.9
1        1      dev        Z   0.7
1        1      dev        Y  0.95

CodePudding user response:

You can use pyyaml:

import yaml
extracted_df = pd.json_normalize(df['extra'].apply(lambda x: yaml.load(re.sub(r',\s*(\w :)', '\n\\1', x), Loader=yaml.SafeLoader)))
new_df = pd.concat([df.drop('extra', axis=1), extracted_df], axis=1)

Output:

>>> new_df
   subject category datasets          acc
0        1      app   [X, Y]   [0.8, 0.9]
1        1      dev   [Z, Y]  [0.7, 0.95]
  • Related