Copy a value from a column to a new one if a certain column contains a specific element pandas-CodePudding

I'm doing a program that has two dataframes from the following dataset: kaggle.com/varpit94/ethereum-data

One of them has a format for dates (YYYY-MM-DD), what I want to do is include the elements of the column Close2018-19 in the dataframe df2 to a new column with its respective months. For instance, if in the Date column I have 2018-01-16 then I want it's respective value from the column Close2018-19 to be saved in a new column called January2018-2019. If I'm not explaining myself too well I could redact again the problem, thanks in advance, here's the code:

import pandas as pd
import numpy as np
import copy
import matplotlib.pyplot as plt


df=pd.read_csv('ETH-USD.csv')


df['Average-H-L'] = df[['High', 'Low']].mean(axis=1)
df = df[['Date','Close','Average-H-L']]

df2018 = df[(df['Date'].str.contains("2018-"))]
df2018.columns = ['Date','Close2018','Average-H-L2018']
df2019 = df[(df['Date'].str.contains("2019-"))]
df2019.columns = ['Date','Close2019','Average-H-L2019']

df2 = pd.concat([df2018,df2019], axis=0, ignore_index=True)
df2['Close'] = df2[['Close2018', 'Close2019']].mean(axis=1)
df2['Average-H-L'] = df2[['Average-H-L2018', 'Average-H-L2019']].mean(axis=1)
df2 = df2[['Date','Close','Average-H-L']]
df2.columns = ['Date','Close2018-19','Average-H-L2018-19']

df2['Year'] = df2['Date'].str.split('-').str[0]


df2['January2018-2019']= df2['Date'].str.contains("-07-")

#result = pd.concat([df, df2], axis=1).corr()
#result
df2

Here's how the table currently looks like: enter image description here

CodePudding user response：

You can use pivot_table passing the Close2018-19 as values and the month names as argument to columns parameter. To obtain the month names you can create a column using the datetime accessor dt.month_name. For fixing the column names use a list comprehension over the df2 columns appending the desired suffix.

df used as input

        Date  Close2018-19
0 2018-01-01      0.496714
1 2018-01-16     -0.138264
2 2018-01-31      0.647689
3 2018-02-15      1.523030
4 2018-03-02     -0.234153
5 2018-03-17     -0.234137
6 2018-04-01      1.579213
7 2018-04-16      0.767435
8 2018-05-01     -0.469474
9 2018-05-16      0.542560

df2 = df[['Date','Close2018-19']]
df2['Date'] = pd.to_datetime(df2['Date'])
df2['month'] = df['Date'].dt.month_name()

df2 = pd.pivot_table(df2, index='Date', columns='month', values='Close2018-19', fill_value=0)

df2.columns = [f'{m}2018-2019' for m in df2.columns]
print(df2)

Output from df2

            April2018-2019  February2018-2019  January2018-2019  March2018-2019  May2018-2019
Date
2018-01-01        0.000000            0.00000          0.496714        0.000000      0.000000
2018-01-16        0.000000            0.00000         -0.138264        0.000000      0.000000
2018-01-31        0.000000            0.00000          0.647689        0.000000      0.000000
2018-02-15        0.000000            1.52303          0.000000        0.000000      0.000000
2018-03-02        0.000000            0.00000          0.000000       -0.234153      0.000000
2018-03-17        0.000000            0.00000          0.000000       -0.234137      0.000000
2018-04-01        1.579213            0.00000          0.000000        0.000000      0.000000
2018-04-16        0.767435            0.00000          0.000000        0.000000      0.000000
2018-05-01        0.000000            0.00000          0.000000        0.000000     -0.469474
2018-05-16        0.000000            0.00000          0.000000        0.000000      0.542560