Home > Net >  Move values from column to column names in a pandas DataFrame
Move values from column to column names in a pandas DataFrame

Time:02-25

I have a dataframe like the one that follows:

import pandas as pd
import numpy as np

filename = [
    "f1",
    "f1",
    "f1",
    "f1",
    "f2",
    "f2",
    "f2",
    "f2",
    "f2",
    "f2"
]
parts = ["p1", "p1", "p2", "p2", "p1", "p1", "p2", "p2", "p3", "p3"]
groups = ["g1", "g2"] * 5
values = [1, 2, 3, 4, 10, 20, 30, 40, 50, 60]
values2 = [i   1 for i in values]
df = pd.DataFrame.from_records({"filename": filename, "part":parts, "f1": values, "group": groups, "f2":values2})
df = df.set_index(["filename", "part"])
print(df)
               f1  f2 group
filename part              
f1       p1     1   2    g1
         p1     2   3    g2
         p2     3   4    g1
         p2     4   5    g2
f2       p1    10  11    g1
         p1    20  21    g2
         p2    30  31    g1
         p2    40  41    g2
         p3    50  51    g1
         p3    60  61    g2

Each file is subdivided into more parts, in this case two parts for f1 and three for f2. For each part, values f1 and f2 are extracted for a given number of features group.

I would like to obtain a new dataframe with: filename, part, f1_g1, f2_g2 i.e. I would like to remove the column group and move its values (g1 and g2) in the column names.

The solution I have implemented is:

import pandas as pd
import numpy as np

filename = [
    "f1",
    "f1",
    "f1",
    "f1",
    "f2",
    "f2",
    "f2",
    "f2",
    "f2",
    "f2"
]
parts = ["p1", "p1", "p2", "p2", "p1", "p1", "p2", "p2", "p3", "p3"]
groups = ["g1", "g2"] * 5
values = [1, 2, 3, 4, 10, 20, 30, 40, 50, 60]
values2 = [i   1 for i in values]
df = pd.DataFrame.from_records({"filename": filename, "part":parts, "f1": values, "group": groups, "f2":values2})
df = df.set_index(["filename", "part"])
print(df)

df = df.reset_index()
groups = df.group.unique()
cols = [c for c in df.columns if (c not in ['filename', 'part', 'group'])]
records = []

for fn in df.filename.unique():
    parts = df.loc[df.filename == fn, "part"].unique()
    for p in parts:
        d = dict()
        for g in groups:
            d['filename'] = fn
            d['part'] = p
            for c in cols:
                value = df.loc[(df.filename == fn) & (df.part == p) & (df.group==g), c].iloc[0]
                d[f"{c}_{g}"] = value
        records.append(d)
df2 = pd.DataFrame.from_records(records).set_index(["filename", "part"])
print(df2)

that builds a DataFrame df2:

               f1_g1  f2_g1  f1_g2  f2_g2
filename part                            
f1       p1        1      2      2      3
         p2        3      4      4      5
f2       p1       10     11     20     21
         p2       30     31     40     41
         p3       50     51     60     61

My actual data has plenty more columns. This solution works, but I do not think it is elegant and the best one might think of.

Is there a better way to do it?

CodePudding user response:

you get most of the way there by adding 'group' to the index and then unstacking:

df2 = df.reset_index().set_index(["filename", "part",'group']).unstack()

To get the column labels into the shape you want, you can do

df2.columns = ['_'.join(c) for c in df2.columns.to_flat_index()]

output:

                f1_g1 f1_g2 f2_g1   f2_g2
filename    part                
f1          p1  1     2     2       3
            p2  3     4     4       5
f2          p1  10    20    11      21
            p2  30    40    31      41
            p3  50    60    51      61
  • Related