Home > Net >  Splitting a cell into columns and transposing by populating them in python
Splitting a cell into columns and transposing by populating them in python

Time:10-04

I have a sample data: sample data

I want to convert it by splitting the column header into two separate columns and transpose the values. The result should look something like this: result

In what way can I split one cell into two column headers with the string and numbers and populate the rows by transposing. Is there an efficient way to do this?

CodePudding user response:

As commented earlier, you should not provide sample data/code/output in picture. You should also provide MWE

If I understand your question correctly, you just need to use melt(). Illustration:

import pandas as pd

Create dummy dataframe:

df = pd.DataFrame({
  'name': ['United States'],
  'POP2010': [1234567],
  'POP2011': [1234980],
  'POP2012': [2234980],
  'POP2013': [4367767],
})

This will give us:

    name            POP2010 POP2011 POP2012 POP2013
  0 United States   1234567 1234980 2234980 4367767

Melt the df above:

df = df.melt(id_vars='name')

This will give us:

    name            variable    value
0   United States   POP2010     1234567
1   United States   POP2011     1234980
2   United States   POP2012     2234980
3   United States   POP2013     4367767

Now, we're close. We just need to split POPxxxx. Assuming that the format is consistent across the variable series, then:

df['MEASURE'] = df['variable'].str[:3]
df['KEY'] = df['variable'].str[3:].astype('int64')

Now, we just need to rearrange the columns order:

cols = ['KEY', 'MEASURE', 'value', 'name', 'variable']
df = df[cols].drop(columns='variable')

And, voila, we got this:

  >>> df
    KEY     MEASURE  value      name
  0 2010    POP      1234567    United States
  1 2011    POP      1234980    United States
  2 2012    POP      2234980    United States
  3 2013    POP      4367767    United States

CodePudding user response:

melt is probably what you are looking for:

df = pd.DataFrame({"Name":['US', 'FR'],
     "POP20": [10000, 5000],
     "POP21": [10500, 5500]})

df.melt("Name")

OUT:

    Name  variable  value
0   US    POP20     10000
1   FR    POP20     5000
2   US    POP21     10500
3   FR    POP21     5500
  • Related