I got the following code which is based on google api calls. What it currently returns is title URL. What I am trying to get is the description of a linkedin page, and split the information in it by ';'. Is there a command for it?
I see there is for getting the URL and TITLE, but I couldn't find in documentation a way to scrape the description:
function getSize(companyName) {
var key="AIzaSyDoZfj4VJPgyYvPbUg6fbObcueAkGxyR-U"
let search = "site:linkedin.com/company " " " companyName " company size"
let searchEngineId = "a480fef134d0c4c6f"
// Call Google Custom Search API
var options = {
'method' : 'get',
'contentType': 'application/json',
};
response = UrlFetchApp.fetch("https://www.googleapis.com/customsearch/v1?key=" key "&q=" search "&cx=" searchEngineId, options);
// Parse linkedin URL and Name
//let url = JSON.parse(response).items[0].formattedUrl
let title = JSON.parse(response).items[0].title.split("-")[1]
// display the results in 2 columns
var results = new Array(1);
let info = new Array(2);
//info[0]=url
info[1]=description
results=info[1]
return results
CodePudding user response:
Checking out the output of the custom search with your parameters, the company size is included under items[0].snippet
:
"items":[
{
"kind":"customsearch#result",
"title":"Twitter | LinkedIn",
"link":"https://www.linkedin.com/company/twitter",
"snippet":"Website: https://careers.twitter.com. Industries: Software Development. Company size: 5,001-10,000 employees. Headquarters: San Francisco, CA."
//...etc
As you can see it's just plaintext and it's not under any metatags, so your best bet is to use a regex to extract the number as suggested in the comments.
response = UrlFetchApp.fetch("https://www.googleapis.com/customsearch/v1?key=" key "&q=" search "&cx=" searchEngineId, options);
var snippet = JSON.parse(response).items[0].snippet
var regex = new RegExp(/(?<=Company size: ).*?(?= employees)/);
var size = regex.exec(snippet)[0]
The value of size
is 5,001-10,000
, extracted from the snippet above. The regex /(?<=Company size: ).*?(?= employees)/
just looks for the value between "Company size:" and "employees".
You'll probably have to do something like this if you want to extract anything that's not in its own tag, be wary of handling cases where the company size is not in the snippet. You can also check the full response in something like JSON Formatter to have a better idea of what the structure of the response looks like.