I am trying to get specific text/line that is under the heading or subheading using the google apps script. Here is the screenshot of the Google doc:
I am looking to export the Question text/line from every heading to google sheets like this:
So far I have been able to get all the headings from the google doc by this code snippet:
function getHeadings() {
var pars = DocumentApp.getActiveDocument().getBody().getParagraphs();
var hdg = [];
for(var i=0; i<pars.length; i ) {
var par = pars[i];
hdg.push(par.getHeading());
}
Logger.log(hdg)
}
But I am unable to get the Question text under all these headings, kindly can you guide me in the right direction? (I apologize for my question formatting, I am a newbie on this website). Thank you
CodePudding user response:
Use getChild() and and push() functions
Setup
I used the following as setup:
Script
function getHeadings() {
var doc = DocumentApp.getActiveDocument();
var childNum = doc.getBody().getNumChildren();
var ss = SpreadsheetApp.openById("<spreadsheetID>").getSheetByName("Sheet1"); //Change ID of Spreadsheet and Sheet name
var out = [];
for (var i = 0; i < childNum; i = 2) {
out.push([doc.getBody().getChild(i).asText().getText()]);
}
for (var i = 1, j = 0; i < childNum; i = 2, j ) {
out[j].push(doc.getBody().getChild(i).asText().getText().replace(/Question(.*)- /gm, ""));
}
ss.getRange(ss.getLastRow() 1, 1, childNum / 2, 2).setValues(out); //edit range to proper coordinates
}
The script uses the getChild()
function to get the texts in the document. Afterwards, the getNumChildren()
will get the number of children to be used in the two separate for loops which will segregate the heading and the question texts. After the texts are identified, they are separated, paired, and then pushed (using the push()
function) into arrays. Once the arrays are formed, the data will be appended to the spreadsheet using the setValues()
function.
Output
Update
You may use the following script as basis for your code:
function getHeadings() {
var doc = DocumentApp.getActiveDocument();
var childNum = doc.getBody().getNumChildren();
var ss = SpreadsheetApp.openById("<SpreadsheetID>").getSheetByName("Sheet1"); //Change ID of Spreadsheet and Sheet name
var header = [];
var question = [];
var out = [];
for (var i = 0; i < childNum; i ) {
if (doc.getBody().getChild(i).asText().getText().match(/Header(.*)/gm)){
header.push(doc.getBody().getChild(i).asText().getText());
}
else if (doc.getBody().getChild(i).asText().getText().match(/Question(.*)/gm)){
question.push(doc.getBody().getChild(i).asText().getText());
}
}
for (j=0; j<header.length; j ) {
out.push([header[j]]);
out[j].push(question[j].replace(/Question(.*)- /gm, ""));
}
ss.getRange(ss.getLastRow() 1, 1, header.length, 2).setValues(out); //edit range to proper coordinates
}
References:
CodePudding user response:
There are a couple of assumptions for this script:
- There is a finite number of header styles you're using (e.g. 'Header 1' and 'Header 2' in my example below)
- Your questions always contain 'QUESTION:'
- There is no other text apart from the headers and the question lines (but if there is, in principle it will be skipped)
In that case, the below code will work:
function getHeadings() {
var pars = DocumentApp.getActiveDocument().getBody().getParagraphs();
var currentHdg = "";
var questions = [];
for(var i=0; i<pars.length; i ) {
var par = pars[i];
var text = par.getText();
var hdg = par.getHeading();
if (hdg.toString().indexOf("HEADING") > -1){
currentHdg = text;
}
else if(text.indexOf("QUESTION:") > -1){
questions.push([currentHdg,text.replace("QUESTION:","").trim()]);
}
}
Logger.log(questions);
}
You can then format questions
into the table output format you need.
Edit: I have updated my answer to cover all heading types.