Splitting a cell into columns and transposing by populating them in python-CodePudding

I have a sample data: sample data

I want to convert it by splitting the column header into two separate columns and transpose the values. The result should look something like this: result

In what way can I split one cell into two column headers with the string and numbers and populate the rows by transposing. Is there an efficient way to do this?

CodePudding user response：

As commented earlier, you should not provide sample data/code/output in picture. You should also provide MWE

If I understand your question correctly, you just need to use melt(). Illustration:

import pandas as pd

Create dummy dataframe:

df = pd.DataFrame({
  'name': ['United States'],
  'POP2010': [1234567],
  'POP2011': [1234980],
  'POP2012': [2234980],
  'POP2013': [4367767],
})

This will give us:

    name            POP2010 POP2011 POP2012 POP2013
  0 United States   1234567 1234980 2234980 4367767

Melt the df above:

df = df.melt(id_vars='name')

This will give us:

    name            variable    value
0   United States   POP2010     1234567
1   United States   POP2011     1234980
2   United States   POP2012     2234980
3   United States   POP2013     4367767

Now, we're close. We just need to split POPxxxx. Assuming that the format is consistent across the variable series, then:

df['MEASURE'] = df['variable'].str[:3]
df['KEY'] = df['variable'].str[3:].astype('int64')

Now, we just need to rearrange the columns order:

cols = ['KEY', 'MEASURE', 'value', 'name', 'variable']
df = df[cols].drop(columns='variable')

And, voila, we got this:

  >>> df
    KEY     MEASURE  value      name
  0 2010    POP      1234567    United States
  1 2011    POP      1234980    United States
  2 2012    POP      2234980    United States
  3 2013    POP      4367767    United States

CodePudding user response：

melt is probably what you are looking for:

df = pd.DataFrame({"Name":['US', 'FR'],
     "POP20": [10000, 5000],
     "POP21": [10500, 5500]})

df.melt("Name")

OUT:

    Name  variable  value
0   US    POP20     10000
1   FR    POP20     5000
2   US    POP21     10500
3   FR    POP21     5500