I am going to financial data of peers of a company. I have 10 peers for a particular company and the financial data is captured at regular intervals (monthly, quarterly, etc). However since the data capturing does not happen for all together I end up having different most recent update date.
What I want to do is to select most recent row for each peer company ID in a way that I end up having only 11 rows in my table i.e. (1 for my company and 10 peers)
Below is the code that I am running as of now
select * from Financials_table
where PRD_END_DT = (select max(PRD_END_DT) from Financials_table ) -- Selecting the latest period end date
''')
peers_df.createOrReplaceTempView('peers_df')
print(shape('peers_df'))
head('peers_df', 50)
Note that I have a list of peers stored in peers_list and I'd like to get the most recent PRD_END_DT for each of the peers. Now what I am running returns the most recent PRD_END_DT value but not all peers have data as on that date.
CodePudding user response:
There are several ways to get the most recent row per company ID. You haven't tagged your request with your DBMS, so some methods may work for you while others may not yet be supported by your DBMS. Here are some options:
Get the maximum prd_end_dt per company_id. Then select the matching rows:
select *
from table
where (company_id, prd_end_dt) in
(
select company_id, max(prd_end_dt)
from financials_table
group by company_id
)
order by company_id;
Select the rows for which no newer prd_end_dt exists for the company_id:
select *
from financials_table ft
where not exists
(
select null
from financials_table newer
where newer.company_id = ft.company_id
and newer.prd_end_dt > ft.prd_end_dt
)
order by company_id;
Get the maximum prd_end_dt on-the-fly. Then compare the dates:
select *
from
(
select ft.*, max(prd_end_dt) over (partition by company_id) as max_prd_end_dt
from financials_table ft
group by company_id
) with_max_prd_end_dt
where prd_end_dt = max_prd_end_dt
order by company_id;
Rank each company's rows per date and only keep the newest:
select *
from financials_table
order by rank() over (partition by company_id order by prd_end_dt desc)
fetch first row with ties;