I have a sample data: sample data
I want to convert it by splitting the column header into two separate columns and transpose the values. The result should look something like this: result
In what way can I split one cell into two column headers with the string and numbers and populate the rows by transposing. Is there an efficient way to do this?
CodePudding user response:
As commented earlier, you should not provide sample data/code/output in picture. You should also provide MWE
If I understand your question correctly, you just need to use melt()
. Illustration:
import pandas as pd
Create dummy dataframe:
df = pd.DataFrame({
'name': ['United States'],
'POP2010': [1234567],
'POP2011': [1234980],
'POP2012': [2234980],
'POP2013': [4367767],
})
This will give us:
name POP2010 POP2011 POP2012 POP2013
0 United States 1234567 1234980 2234980 4367767
Melt the df
above:
df = df.melt(id_vars='name')
This will give us:
name variable value
0 United States POP2010 1234567
1 United States POP2011 1234980
2 United States POP2012 2234980
3 United States POP2013 4367767
Now, we're close. We just need to split POPxxxx. Assuming that the format is consistent across the variable
series, then:
df['MEASURE'] = df['variable'].str[:3]
df['KEY'] = df['variable'].str[3:].astype('int64')
Now, we just need to rearrange the columns order:
cols = ['KEY', 'MEASURE', 'value', 'name', 'variable']
df = df[cols].drop(columns='variable')
And, voila, we got this:
>>> df
KEY MEASURE value name
0 2010 POP 1234567 United States
1 2011 POP 1234980 United States
2 2012 POP 2234980 United States
3 2013 POP 4367767 United States
CodePudding user response:
melt is probably what you are looking for:
df = pd.DataFrame({"Name":['US', 'FR'],
"POP20": [10000, 5000],
"POP21": [10500, 5500]})
df.melt("Name")
OUT:
Name variable value
0 US POP20 10000
1 FR POP20 5000
2 US POP21 10500
3 FR POP21 5500