I have a pandas DataFrame like this:
Year1 Year2 Total
0 2010 2011 2500
1 2012 2013 3000
2 2014 2015 4000
I want to grab the data in the Year2
column and merge it with the Year1
column, and keep the Total
value associated with it, which should look like:
Year1 Total
0 2010 2500
1 2011 2500
2 2012 3000
3 2013 3000
4 2014 4000
5 2015 4000
I have considered first of all duplicating the df
so that I get the second 'Total' value for the 2011, 2013 and 2015
df = pd.DataFrame(np.repeat(df.values, 2, axis=0))
df.columns = ['Year1', 'Year2', 'Total']
but I'm still unsure of the steps to merge the column data from Year2
to Year1
.
CodePudding user response:
You could melt
it:
out = (pd.melt(df, id_vars=['Total']).rename(columns={'value':'Year1'})
.drop(columns='variable')[['Year1', 'Total']]
.sort_values(by='Year1').reset_index(drop=True))
or set_index
with "Total" unstack
:
out = (df.set_index('Total').unstack().droplevel(0)
.reset_index(name='Year1')[['Year1', 'Total']]
.sort_values(by='Year1').reset_index(drop=True))
Output:
Year1 Total
0 2010 2500
1 2011 2500
2 2012 3000
3 2013 3000
4 2014 4000
5 2015 4000
CodePudding user response:
You can achieve the desired output using append
function but with a few steps before:
import pandas as pd
df = pd.read_csv('df.txt')
newDf = df[["Year2", "Total"]].rename(columns={"Year2":"Year1"})
df.drop(columns=["Year2"], inplace=True)
resultDf = df.append(newDf)
resultDf.sort_values("Year1")
Output
Year1 | Total |
---|---|
2010 | 2500 |
2011 | 2500 |
2012 | 3000 |
2013 | 3000 |
2014 | 4000 |
2015 | 4000 |