Home > Blockchain >  Year over Year percent change using quarterly data
Year over Year percent change using quarterly data

Time:06-11

I would like to calculate year-over-year changes based on the quarterly data presented below. This is the current code I have which logically makes sense but produces a ZeroDivisionError: float division by zero.

df_all_revenue['revtq_yoy_chg'] = df_all_revenue.groupby('cusip')['revtq'].pct_change(periods=4)

Explanation of my code:

There are other stocks in this same Dataframe (df_all_revenue), so only perform this percentage calculation based on cusip number aka stock identifing number. Based on Cusip number, do a percentage calculation offset by 4 rows (4 quarters). Create a new column called revtq_yoy_chg with the resulting number. Should I be adding the datadate as an additional criteria for the Cusip number? I feel like this is already taken care of by the 4 row offset.

Df_all_revenue:

df_all_revenue

Df's dict for reference:

{'tic': {0: 'AAPL', 1: 'AAPL', 2: 'AAPL', 3: 'AAPL', 4: 'AAPL'},
 'cusip': {0: '037833100',
  1: '037833100',
  2: '037833100',
  3: '037833100',
  4: '037833100'},
 'datadate': {0: datetime.date(1979, 12, 31),
  1: datetime.date(1980, 3, 31),
  2: datetime.date(1980, 6, 30),
  3: datetime.date(1980, 9, 30),
  4: datetime.date(1980, 12, 31)},
 'fyearq': {0: 1980.0, 1: 1980.0, 2: 1980.0, 3: 1980.0, 4: 1981.0},
 'fqtr': {0: 1.0, 1: 2.0, 2: 3.0, 3: 4.0, 4: 1.0},
 'fdateq': {0: None, 1: None, 2: None, 3: None, 4: None},
 'pdateq': {0: None, 1: None, 2: None, 3: None, 4: None},
 'revtq': {0: 19.54, 1: 23.55, 2: 32.569, 3: 41.467, 4: 67.621},
 'oiadpq': {0: 5.178, 1: 5.509, 2: 5.488, 3: 7.411, 4: 14.914},
 'niq': {0: 2.647, 1: 2.788, 2: 2.735, 3: 3.528, 4: 7.421}}

CodePudding user response:

As already said, you need to check if there are empty values in the 'revtq' column.

print(df_all_revenue['revtq'].isnull().any()) 

The printout will return True if there are any. It is also possible that some value is in the form of a string or some other type. You can check the values of a column in a loop by catching an exception.

for i in range(0, len(df_all_revenue['revtq'])):
    try:
        if type(df_all_revenue['revtq'].values[i])!= float:
            print(df_all_revenue.index[i], df_all_revenue['tic'].values[i], df_all_revenue['revtq'].values[i], type(df_all_revenue['revtq'].values[i]))
    except:
        continue

If everything is in order, then you can then convert the 'datadate' column to a date by setting it as an index. Sort and try again.

df_all_revenue['datadate'] = pd.to_datetime(df_all_revenue['datadate'])
df_all_revenue.set_index(['datadate'], inplace=True)
df_all_revenue = df_all_revenue.sort_values(['tic', 'datadate'], ascending=[True, True])
df_all_revenue['revtq_yoy_chg'] = df_all_revenue.groupby(['tic'])['revtq'].pct_change(periods=4)

If this does not work, then you need to look for which line it turns out to be zero.

print(df_all_revenue[df_all_revenue['revtq'] == 0])

CodePudding user response:

I'm sorry my 1st answer wasn't what you were looking for.

Here's another try:

I've created some dummy data:

  • 2 different stocks with 2 different cusips
  • The data starts with Q1 1980, ends with Q4 1982
  • note: there are no missing values, all 4 quarters of each year are reported

data = [{'tic': 'AAPL',
  'cusip': 'A00000',
  'datadate': Timestamp('1979-12-31 00:00:00'),
  'fyearq': 1979,
  'fqtr': 4,
  'revtq': 50.0},
 {'tic': 'MSFT',
  'cusip': 'A11111',
  'datadate': Timestamp('1979-12-31 00:00:00'),
  'fyearq': 1979,
  'fqtr': 4,
  'revtq': 75.0},
 {'tic': 'AAPL',
  'cusip': 'A00000',
  'datadate': Timestamp('1980-03-31 00:00:00'),
  'fyearq': 1980,
  'fqtr': 1,
  'revtq': 53.5},
 {'tic': 'MSFT',
  'cusip': 'A11111',
  'datadate': Timestamp('1980-03-31 00:00:00'),
  'fyearq': 1980,
  'fqtr': 1,
  'revtq': 78.0},
 {'tic': 'AAPL',
  'cusip': 'A00000',
  'datadate': Timestamp('1980-06-30 00:00:00'),
  'fyearq': 1980,
  'fqtr': 2,
  'revtq': 56.71},
 {'tic': 'MSFT',
  'cusip': 'A11111',
  'datadate': Timestamp('1980-06-30 00:00:00'),
  'fyearq': 1980,
  'fqtr': 2,
  'revtq': 85.80000000000001},
 {'tic': 'AAPL',
  'cusip': 'A00000',
  'datadate': Timestamp('1980-09-30 00:00:00'),
  'fyearq': 1980,
  'fqtr': 3,
  'revtq': 60.679700000000004},
 {'tic': 'MSFT',
  'cusip': 'A11111',
  'datadate': Timestamp('1980-09-30 00:00:00'),
  'fyearq': 1980,
  'fqtr': 3,
  'revtq': 94.38000000000002},
 {'tic': 'AAPL',
  'cusip': 'A00000',
  'datadate': Timestamp('1980-12-31 00:00:00'),
  'fyearq': 1980,
  'fqtr': 4,
  'revtq': 66.14087300000001},
 {'tic': 'MSFT',
  'cusip': 'A11111',
  'datadate': Timestamp('1980-12-31 00:00:00'),
  'fyearq': 1980,
  'fqtr': 4,
  'revtq': 95.32380000000002},
 {'tic': 'AAPL',
  'cusip': 'A00000',
  'datadate': Timestamp('1981-03-31 00:00:00'),
  'fyearq': 1981,
  'fqtr': 1,
  'revtq': 70.77073411000002},
 {'tic': 'MSFT',
  'cusip': 'A11111',
  'datadate': Timestamp('1981-03-31 00:00:00'),
  'fyearq': 1981,
  'fqtr': 1,
  'revtq': 104.85618000000002},
 {'tic': 'AAPL',
  'cusip': 'A00000',
  'datadate': Timestamp('1981-06-30 00:00:00'),
  'fyearq': 1981,
  'fqtr': 2,
  'revtq': 72.89385613330002},
 {'tic': 'MSFT',
  'cusip': 'A11111',
  'datadate': Timestamp('1981-06-30 00:00:00'),
  'fyearq': 1981,
  'fqtr': 2,
  'revtq': 110.09898900000003},
 {'tic': 'AAPL',
  'cusip': 'A00000',
  'datadate': Timestamp('1981-09-30 00:00:00'),
  'fyearq': 1981,
  'fqtr': 3,
  'revtq': 75.80961037863203},
 {'tic': 'MSFT',
  'cusip': 'A11111',
  'datadate': Timestamp('1981-09-30 00:00:00'),
  'fyearq': 1981,
  'fqtr': 3,
  'revtq': 121.10888790000004},
 {'tic': 'AAPL',
  'cusip': 'A00000',
  'datadate': Timestamp('1981-12-31 00:00:00'),
  'fyearq': 1981,
  'fqtr': 4,
  'revtq': 77.32580258620467},
 {'tic': 'MSFT',
  'cusip': 'A11111',
  'datadate': Timestamp('1981-12-31 00:00:00'),
  'fyearq': 1981,
  'fqtr': 4,
  'revtq': 133.21977669000006},
 {'tic': 'AAPL',
  'cusip': 'A00000',
  'datadate': Timestamp('1982-03-31 00:00:00'),
  'fyearq': 1982,
  'fqtr': 1,
  'revtq': 77.32580258620467},
 {'tic': 'MSFT',
  'cusip': 'A11111',
  'datadate': Timestamp('1982-03-31 00:00:00'),
  'fyearq': 1982,
  'fqtr': 1,
  'revtq': 133.21977669000006},
 {'tic': 'AAPL',
  'cusip': 'A00000',
  'datadate': Timestamp('1982-06-30 00:00:00'),
  'fyearq': 1982,
  'fqtr': 2,
  'revtq': 77.32580258620467},
 {'tic': 'MSFT',
  'cusip': 'A11111',
  'datadate': Timestamp('1982-06-30 00:00:00'),
  'fyearq': 1982,
  'fqtr': 2,
  'revtq': 135.88417222380005},
 {'tic': 'AAPL',
  'cusip': 'A00000',
  'datadate': Timestamp('1982-09-30 00:00:00'),
  'fyearq': 1982,
  'fqtr': 3,
  'revtq': 85.05838284482515},
 {'tic': 'MSFT',
  'cusip': 'A11111',
  'datadate': Timestamp('1982-09-30 00:00:00'),
  'fyearq': 1982,
  'fqtr': 3,
  'revtq': 144.03722255722806},
 {'tic': 'AAPL',
  'cusip': 'A00000',
  'datadate': Timestamp('1982-12-31 00:00:00'),
  'fyearq': 1982,
  'fqtr': 4,
  'revtq': 97.4033828448251},
 {'tic': 'MSFT',
  'cusip': 'A11111',
  'datadate': Timestamp('1982-12-31 00:00:00'),
  'fyearq': 1982,
  'fqtr': 4,
  'revtq': 178.982222557228}]

Then I did this:

# creating our df
df = pd.DataFrame(data)

# list of all our cusips
my_cusips = df['cusip'].unique()

results = {}

for idx, x in enumerate(my_cusips):
    # filter for each cusip then use .pct_change()
    s = df.loc[df['cusip'] == x, 'revtq'].pct_change(periods=4)
    # s now contains our df's original row indices   associated .pct_change for these rows
    # turn it into dict
    s = s.to_dict()
    # store it in results dict
    results.update(s)
    
sorted_results = dict(sorted(results.items(), key=lambda item: item[0]))

# turn sorted_results into a dataframe
results_df = pd.DataFrame.from_dict(sorted_results, orient='index',columns=['yoy_change'])

# merge results_df to our original df
pd.merge(df, results_df, left_index=True, right_index=True)

End result:

enter image description here

I hope it helps!

CodePudding user response:

I've added some more (dummy) data to your df for testing purposes:

enter image description here

# you can use pct_change
yoy['yoy_change_pct'] = yoy['revtq'].pct_change(periods=4)
yoy

End result:

enter image description here

  • Related