Home > Blockchain >  Join data from one column in to another column as a separate row
Join data from one column in to another column as a separate row

Time:03-20

I have a pandas DataFrame like this:

    Year1   Year2   Total
0   2010    2011    2500
1   2012    2013    3000
2   2014    2015    4000

I want to grab the data in the Year2 column and merge it with the Year1 column, and keep the Total value associated with it, which should look like:

    Year1   Total
0   2010    2500
1   2011    2500
2   2012    3000
3   2013    3000
4   2014    4000
5   2015    4000

I have considered first of all duplicating the df so that I get the second 'Total' value for the 2011, 2013 and 2015

df = pd.DataFrame(np.repeat(df.values, 2, axis=0))
df.columns = ['Year1', 'Year2', 'Total']

but I'm still unsure of the steps to merge the column data from Year2 to Year1.

CodePudding user response:

You could melt it:

out = (pd.melt(df, id_vars=['Total']).rename(columns={'value':'Year1'})
       .drop(columns='variable')[['Year1', 'Total']]
       .sort_values(by='Year1').reset_index(drop=True))

or set_index with "Total" unstack:

out = (df.set_index('Total').unstack().droplevel(0)
       .reset_index(name='Year1')[['Year1', 'Total']]
       .sort_values(by='Year1').reset_index(drop=True))

Output:

   Year1  Total
0   2010   2500
1   2011   2500
2   2012   3000
3   2013   3000
4   2014   4000
5   2015   4000

CodePudding user response:

You can achieve the desired output using append function but with a few steps before:

import pandas as pd
df = pd.read_csv('df.txt')
newDf = df[["Year2", "Total"]].rename(columns={"Year2":"Year1"})
df.drop(columns=["Year2"], inplace=True)
resultDf = df.append(newDf)
resultDf.sort_values("Year1")

Output

Year1 Total
2010 2500
2011 2500
2012 3000
2013 3000
2014 4000
2015 4000
  • Related