I have a dataframe with date, agenda, speaker, party, and text (which the speaker said). I want to reshape the data so that I have date, agenda, party, and combined text. Have been trying to no avail with mutate, summarise, aggregate and pmap.
Any idea? I would really prefer to avoid for loops as they would entail three layers, and my data is quite large.
What I have so far is along the lines of:
combining <- df %>% group_by(date, agenda, party) %>% mutate(combined_text = paste(text, sep = "_"))
The sample df (sorry that it's in this clunky format):
date agenda party
4908 1988-12-13 Pensioners (Health) [Oral Answers To Questions > Health] Lab
4909 1988-12-13 Pensioners (Health) [Oral Answers To Questions > Health] Lab
4910 1988-12-13 Pensioners (Health) [Oral Answers To Questions > Health] Con
4911 1988-12-13 Pensioners (Health) [Oral Answers To Questions > Health] Lab
4912 1988-12-13 Pensioners (Health) [Oral Answers To Questions > Health] Con
4913 1988-12-13 Pensioners (Health) [Oral Answers To Questions > Health] Con
speaker
4908 Alistair Darling
4909 Peter Hardy
4910 Edwina Currie
4911 Alistair Darling
4912 Edwina Currie
4913 Michael McNair-Wilson
text
4908 To ask the Secretary of State for Health if he plans to add to the advice given by his Under-Secretary of State, the hon. Member for Derbyshire, South (Mrs. Currie) at Reading, to pensioners on keeping healthy during the winter months.
4909 To ask the Secretary of State for Health if he has received any representations from pensioners or pensioners' organisations about the advice given by the Under-Secretary of State, the hon. Member for Derbyshire, South (Mrs. Currie) at Reading on the subject of keeping healthy during the winter.
4910 On pensioners' health day in Reading the Government drew attention to simple advice on self-help which can make a difference to winter mortality. For the second year running we have a " Keep Warm, Keep Well" campaign, involving the five Government Departments and voluntary organisations, which is proving to be very effective. We have received around 400 letters on the topic. The telephone helpline, run by Help the Aged, is now receiving over 700 calls per week. We are very pleased with the success of the campaign.
4911 On reflection, does the hon. Lady consider that her remarks were ill-judged and stupid? For how much longer will she be allowed to act as court jester, deflecting attention from the fact that for many pensioners this Christmas the choice will be between heating their houses and eating? Is she aware that the best Christmas present that she could give most pensioners, and, indeed, many Conservative Members, would be a month's silence.
4912 The hon. Gentleman seems to have forgotten that the worst winter in recent years for excess winter mortality and hypothermia was 1979. If the Opposition had their way, there would be no such campaign. There was no campaign in the 1970s when winter mortality was much higher than now. The advice is plain common sense and the Opposition would do better to back it.
4913 I congratulate my hon. Friend on her advice for the elderly. Will she confirm that about 20 per cent. of body heat can be lost through the top of the head and that if one wore a hat, the heat loss would be reduced.
CodePudding user response:
The following should do what you want. Note the use of summarize
instead of mutate
, and "collapse" inside of paste
:
combining <- df %>% group_by(date, agenda, party) %>% summarize(combined_text = paste(text, collapse = "_"))
CodePudding user response:
you are doing it right, use collapse =
in paste0()
, also when rows are expected to reduce use summarise()
since mutate()
adds a column but not reduce the records.
Sample data :
df <- data.frame(
Date = c('1988-12-13','1988-12-13','1988-12-13','1988-12-13','1988-12-14','1988-12-14')
,agenda = c('Pensions','Pensions','Pensions','Pensions','1988-12-14','1988-12-14')
,party = c('ABC','ABC','ABC','ABC','DEF','DEF')
,text = c('Text1','text2','text3','text4','text5','text6')
)
now :
df%>%
group_by(Date,agenda,party)%>%
summarise(CombinedText = paste0(text,collapse = "_"))
output :
Date agenda party CombinedText
<chr> <chr> <chr> <chr>
1 1988-12-13 Pensions ABC Text1_text2_text3_text4
2 1988-12-14 1988-12-14 DEF text5_text6
CodePudding user response:
With tidyverse
, we can use str_c
library(dplyr)
library(stringr)
combined <- df %>%
group_by(date, agenda, party) %>%
summarise(CombinedText = str_c(text, collapse = "_"))