I'm doing a program that has two dataframes from the following dataset: kaggle.com/varpit94/ethereum-data
One of them has a format for dates (YYYY-MM-DD), what I want to do is include the elements of the column Close2018-19 in the dataframe df2 to a new column with its respective months. For instance, if in the Date column I have 2018-01-16 then I want it's respective value from the column Close2018-19 to be saved in a new column called January2018-2019. If I'm not explaining myself too well I could redact again the problem, thanks in advance, here's the code:
import pandas as pd
import numpy as np
import copy
import matplotlib.pyplot as plt
df=pd.read_csv('ETH-USD.csv')
df['Average-H-L'] = df[['High', 'Low']].mean(axis=1)
df = df[['Date','Close','Average-H-L']]
df2018 = df[(df['Date'].str.contains("2018-"))]
df2018.columns = ['Date','Close2018','Average-H-L2018']
df2019 = df[(df['Date'].str.contains("2019-"))]
df2019.columns = ['Date','Close2019','Average-H-L2019']
df2 = pd.concat([df2018,df2019], axis=0, ignore_index=True)
df2['Close'] = df2[['Close2018', 'Close2019']].mean(axis=1)
df2['Average-H-L'] = df2[['Average-H-L2018', 'Average-H-L2019']].mean(axis=1)
df2 = df2[['Date','Close','Average-H-L']]
df2.columns = ['Date','Close2018-19','Average-H-L2018-19']
df2['Year'] = df2['Date'].str.split('-').str[0]
df2['January2018-2019']= df2['Date'].str.contains("-07-")
#result = pd.concat([df, df2], axis=1).corr()
#result
df2
Here's how the table currently looks like: enter image description here
CodePudding user response:
You can use pivot_table
passing the Close2018-19
as values
and the month names as argument to columns
parameter. To obtain the month names you can create a column using the datetime accessor dt.month_name
. For fixing the column names use a list comprehension over the df2
columns appending the desired suffix.
df used as input
Date Close2018-19
0 2018-01-01 0.496714
1 2018-01-16 -0.138264
2 2018-01-31 0.647689
3 2018-02-15 1.523030
4 2018-03-02 -0.234153
5 2018-03-17 -0.234137
6 2018-04-01 1.579213
7 2018-04-16 0.767435
8 2018-05-01 -0.469474
9 2018-05-16 0.542560
df2 = df[['Date','Close2018-19']]
df2['Date'] = pd.to_datetime(df2['Date'])
df2['month'] = df['Date'].dt.month_name()
df2 = pd.pivot_table(df2, index='Date', columns='month', values='Close2018-19', fill_value=0)
df2.columns = [f'{m}2018-2019' for m in df2.columns]
print(df2)
Output from df2
April2018-2019 February2018-2019 January2018-2019 March2018-2019 May2018-2019
Date
2018-01-01 0.000000 0.00000 0.496714 0.000000 0.000000
2018-01-16 0.000000 0.00000 -0.138264 0.000000 0.000000
2018-01-31 0.000000 0.00000 0.647689 0.000000 0.000000
2018-02-15 0.000000 1.52303 0.000000 0.000000 0.000000
2018-03-02 0.000000 0.00000 0.000000 -0.234153 0.000000
2018-03-17 0.000000 0.00000 0.000000 -0.234137 0.000000
2018-04-01 1.579213 0.00000 0.000000 0.000000 0.000000
2018-04-16 0.767435 0.00000 0.000000 0.000000 0.000000
2018-05-01 0.000000 0.00000 0.000000 0.000000 -0.469474
2018-05-16 0.000000 0.00000 0.000000 0.000000 0.542560