So I want to show this data in just two columns. For example, I want to turn this data
Year Jan Feb Mar Apr May Jun
1997 3.45 2.15 1.89 2.03 2.25 2.20
1998 2.09 2.23 2.24 2.43 2.14 2.17
1999 1.85 1.77 1.79 2.15 2.26 2.30
2000 2.42 2.66 2.79 3.04 3.59 4.29
into this
Date Price
Jan-1977 3.45
Feb-1977 2.15
Mar-1977 1.89
Apr-1977 2.03
....
Jan-2000 2.42
Feb-2000 2.66
So far, I have read about how to combine two columns into another dataframe using .apply()
.agg()
, but no info how to combine them as I showed above.
import pandas as pd
df = pd.read_csv('matrix-A.csv', index_col =0 )
matrix_b = ({})
new = pd.DataFrame(matrix_b)
new["Date"] = df['Year'].astype(float) "-" df["Dec"]
print(new)
I have tried this way, but it of course does not work. I have also tried using pd.Series()
but no success
I want to ask whether there is any site where I can learn how to do this, or does anybody know correct way to solve this?
CodePudding user response:
You can first convert it to long-form using melt
. Then, create a new column for Date
by combining two columns.
long_df = pd.melt(df, id_vars=['Year'], var_name='Month', value_name="Price")
long_df['Date'] = long_df['Month'] "-" long_df['Year'].astype('str')
long_df[['Date', 'Price']]
If you want to sort your date column, here is a good resource. Follow those instructions after melt
ing and before creating the Date
column.
CodePudding user response:
You can use pandas.DataFrame.melt
:
out = (
df
.melt(id_vars="Year", var_name="Month", value_name="Price")
.assign(month_num= lambda x: pd.to_datetime(x["Month"] , format="%b").dt.month)
.sort_values(by=["Year", "month_num"])
.assign(Date= lambda x: x.pop("Month") "-" x.pop("Year").astype(str))
.loc[:, ["Date", "Price"]]
)
# Output :
print(out)
Date Price
0 Jan-1997 3.45
4 Feb-1997 2.15
8 Mar-1997 1.89
12 Apr-1997 2.03
16 May-1997 2.25
.. ... ...
7 Feb-2000 2.66
11 Mar-2000 2.79
15 Apr-2000 3.04
19 May-2000 3.59
23 Jun-2000 4.29
[24 rows x 2 columns]
CodePudding user response:
Another possible solution, which is based on pandas.DataFrame.stack
:
out = df.set_index('Year').stack()
out.index = ['{}_{}'.format(j, i) for i, j in out.index]
out = out.reset_index()
out.columns = ['Date', 'Value']
Output:
Date Value
0 Jan_1997 3.45
1 Feb_1997 2.15
2 Mar_1997 1.89
3 Apr_1997 2.03
4 May_1997 2.25
....
19 Feb_2000 2.66
20 Mar_2000 2.79
21 Apr_2000 3.04
22 May_2000 3.59
23 Jun_2000 4.29