Home > Back-end >  How to flatten a multi column dataframe into 2 columns
How to flatten a multi column dataframe into 2 columns

Time:06-09

Given the following table:

group_a = {'ba':[2.0,9.4,10.8],
          'bb':[4.2,7.1,3],
          'bc':[8.1,9.5,6.1]}

A = pd.DataFrame(group_a, index=['aa','ab','ac'])

That looks like this:

    ba   bb   bc
aa  2.0  4.2  8.1
ab  9.4  7.1  9.5
ac  10.8 3.0  6.1

How can I flatten this table so that it looks like this:

       Values
aa_ba  2.0
aa_bb  4.2
aa_bc  8.1
ab_ba  9.4
ab_bb  7.1
ab_bc  9.5
ac_ba  10.8
ac_bb  3.0
ac_bc  6.1

CodePudding user response:

You can use stack and rework the index:

B = A.stack()
B.index = B.index.map('_'.join)
out = B.to_frame('Values')

output:

       Values
aa_ba     2.0
aa_bb     4.2
aa_bc     8.1
ab_ba     9.4
ab_bb     7.1
ab_bc     9.5
ac_ba    10.8
ac_bb     3.0
ac_bc     6.1

CodePudding user response:

Since you have your indexes set, you can do this most easily with a .stack operation. This results in a pd.Series with a MultiIndex, we can use a "_".join to join each level of the MultiIndex by an underscore and create a flat Index. Lastly, since you wanted a single column DataFrame you can use .to_frame() to convert the Series into a DataFrame

out = A.stack()
out.index = out.index.map("_".join)
out = out.to_frame("values")

print(out)
       values
aa_ba     2.0
aa_bb     4.2
aa_bc     8.1
ab_ba     9.4
ab_bb     7.1
ab_bc     9.5
ac_ba    10.8
ac_bb     3.0
ac_bc     6.1

You can also use a method chained approach- just need to use .pipe to access the stacked index:

out = (
    A.stack()
    .pipe(lambda s: s.set_axis(s.index.map("_".join)))
    .to_frame("values")
)

print(out)
       values
aa_ba     2.0
aa_bb     4.2
aa_bc     8.1
ab_ba     9.4
ab_bb     7.1
ab_bc     9.5
ac_ba    10.8
ac_bb     3.0
ac_bc     6.1
  • Related