i have a home work assignment : We want to find all questions about the python pandas library, as well as their answers. Write a query that retrieves all the questions for which the title contains the word "pandas" from the posts_questions table, as well as all the appropriate answers for each such question from the post_answers , where each row in the returned table will represent a pair of (question answer). If the question has a number Answers, the same question will appear in multiple rows in the returned table. returned and the of the question as well as the following fields: the id, title, tag, answer_count score, creation time (creation_date) and the body of the text (the body) of both the question and the answer. For the body, all slash characters must be removed the line '\n'.
for this i wrote the following SQL code:
SELECT tb1.id as q_id,tb1.title as q_title,tb1.tags as q_tags
,tb1.creation_date as q_creation_date,tb1.score as q_score,tb1.answer_count as q_answer_count
,REPLACE(tb1.body,'\n',' ') as body_qustion,REPLACE(tb2.body,'\n',' ') as body_answer
from `bigquery-public-data.stackoverflow.posts_questions` as tb1
left join `bigquery-public-data.stackoverflow.posts_answers` as tb2
on tb1.id=tb2.id
where( tb1.title like "%pandas%" or tb1.title like "%Pandas%" or tb1.title like "%PANDAS%")
group by tb1.id ,tb1.title ,tb1.tags,tb1.creation_date,tb1.score
,tb1.answer_count,body_qustion,body_answer
but the problem is that when for example for a question i have 3 answers i expect it to return 3 rows for the question instead it returns only one and i dont know what is the problem .
the data is : bigquery-public-data.stackoverflow.posts_questions
and bigquery-public-data.stackoverflow.posts_answers :
CodePudding user response:
You have joined with the wrong ID of the answer table. In the answer table ID column represents the ID of the answer itself whereas parent_id represents the question id. You can play with the below query to have more understanding.
Query:
SELECT
q.id AS q_id #id of the question in question table
,
a.id AS a_id #id of the answer in answer table
,
q.title AS q_title,
q.tags AS q_tags,
q.creation_date AS q_creation_date,
q.score AS q_score,
q.answer_count AS q_answer_count,
REPLACE(q.body,'\n',' ') AS body_qustion,
REPLACE(a.body,'\n',' ') AS body_answer
FROM
`bigquery-public-data.stackoverflow.posts_questions` q
LEFT JOIN
`bigquery-public-data.stackoverflow.posts_answers` a
ON
q.id = a.parent_id #Joining with quesiton Ids
WHERE
LOWER(q.title) LIKE '%pandas%'
AND q.creation_date BETWEEN '2021-01-01'
AND '2021-01-31'
AND q.answer_count >1
Output: