I would like to understand if I can convert a string column to a PeriodIndex (for instance year), preserving the string (suffix).
I have the following DataFrame:
company date ... revenue taxes
Facebook 2017-01-01 00:00:00 Total ... 1796.00 0.00
Facebook 2018-07-01 00:00:00 Total ... 7423.20 -11.54
Facebook Total - ... 1704.00 0.00
Google 2017-12-01 00:00:00 Total ... 1938.60 -1938.60
Google 2018-12-01 00:00:00 Total ... 1403.47 -102.01
Google 2018-01-01 00:00:00 Total ... 2028.00 -76.38
Google Total - ... 800.00 -256.98
I'm trying to apply the PeriodIndex to date:
df['date'] = pd.PeriodIndex(df['date'].values, freq='Y')
However, nothing happens because Pandas can't convert it to a string. I can't remove the word Total from my DataFrame.
This is what I expect to achieve:
company date ... revenue taxes
Facebook 2017 Total ... 1796.00 0.00
Facebook 2018 Total ... 7423.20 -11.54
Facebook Total - ... 1704.00 0.00
Google 2017 Total ... 1938.60 -1938.60
Google 2018 Total ... 1403.47 -102.01
Google 2018 Total ... 2028.00 -76.38
Google Total - ... 800.00 -256.98
Any way I can get around with this?
Thanks!
CodePudding user response:
Let's say there is a dummy dataframe, similiar with yours:
dictionary = {'company' : ['Facebook', 'Facebook', 'Facebook_Total','Google','Google_Total'],
'date' : ['2019-09-14 09:00:08.279000 09:00 Total',
'2020-09-14 09:00:08.279000 09:00 Total',
'-',
'2021-09-14 09:00:08.279000 09:00 Total',
'-'],
'revenue' : [10,20,30,40,50]}
df = pd.DataFrame(dictionary)
I used regex
module to delete Total behind the year column as following:
substring = ' Total'
for i in range(len(df)):
if re.search(substring, df['date'][i] , flags=re.IGNORECASE):
df['date'][i] = df['date'][i].replace(' Total','')
else: pass
Then, I used pd.PeriodIndex
as following:
for i in range(len(df)) :
if df['date'][i] == '-':
pass
else:
df['date'][i] = pd.PeriodIndex(pd.Series(df['date'][i]), freq='Y')[0]
for i in range(len(df)):
if df['date'][i] == '-':
pass
else:
df['date'][i] = str(df['date'][i]) ' Total'
The above code returns :
Out[1]:
company date revenue
0 Facebook 2019 Total 10
1 Facebook 2020 Total 20
2 Facebook_Total - 30
3 Google 2021 Total 40
4 Google_Total - 50