Combine a row with column in dataFrame and show the corresponding values-CodePudding

So I want to show this data in just two columns. For example, I want to turn this data

  Year  Jan     Feb      Mar    Apr      May    Jun 
  1997  3.45    2.15    1.89    2.03    2.25    2.20
  1998  2.09    2.23    2.24    2.43    2.14    2.17
  1999  1.85    1.77    1.79    2.15    2.26    2.30
  2000  2.42    2.66    2.79    3.04    3.59    4.29

into this

  Date        Price
Jan-1977       3.45
Feb-1977       2.15
Mar-1977       1.89
Apr-1977       2.03
....
Jan-2000       2.42
Feb-2000       2.66

So far, I have read about how to combine two columns into another dataframe using .apply() .agg(), but no info how to combine them as I showed above.

import pandas as pd

df = pd.read_csv('matrix-A.csv', index_col =0 )

matrix_b = ({})
new = pd.DataFrame(matrix_b)

new["Date"] = df['Year'].astype(float)   "-"   df["Dec"]
print(new)

I have tried this way, but it of course does not work. I have also tried using pd.Series() but no success

I want to ask whether there is any site where I can learn how to do this, or does anybody know correct way to solve this?

CodePudding user response：

You can first convert it to long-form using melt. Then, create a new column for Date by combining two columns.

long_df = pd.melt(df, id_vars=['Year'], var_name='Month', value_name="Price")
long_df['Date'] = long_df['Month']   "-"   long_df['Year'].astype('str')
long_df[['Date', 'Price']]

If you want to sort your date column, here is a good resource. Follow those instructions after melting and before creating the Date column.

CodePudding user response：

You can use pandas.DataFrame.melt :

out = (
        df
          .melt(id_vars="Year", var_name="Month", value_name="Price")
          .assign(month_num= lambda x: pd.to_datetime(x["Month"] , format="%b").dt.month)
          .sort_values(by=["Year", "month_num"])
          .assign(Date= lambda x: x.pop("Month")   "-"   x.pop("Year").astype(str))
          .loc[:, ["Date", "Price"]]
      )

# Output :

print(out)
    
        Date  Price
0   Jan-1997   3.45
4   Feb-1997   2.15
8   Mar-1997   1.89
12  Apr-1997   2.03
16  May-1997   2.25
..       ...    ...
7   Feb-2000   2.66
11  Mar-2000   2.79
15  Apr-2000   3.04
19  May-2000   3.59
23  Jun-2000   4.29

[24 rows x 2 columns]

CodePudding user response：

Another possible solution, which is based on pandas.DataFrame.stack:

out = df.set_index('Year').stack()
out.index = ['{}_{}'.format(j, i) for i, j in out.index]
out = out.reset_index()
out.columns = ['Date', 'Value']

Output:

        Date  Value
0   Jan_1997   3.45
1   Feb_1997   2.15
2   Mar_1997   1.89
3   Apr_1997   2.03
4   May_1997   2.25
....
19  Feb_2000   2.66
20  Mar_2000   2.79
21  Apr_2000   3.04
22  May_2000   3.59
23  Jun_2000   4.29