Extract thousands of Google Results into your Google Sheet

Install this add-on for Free from the G Suite Marketplace

Also available in the Chrome Store

Scraping, i.e. the extraction of data from websites, is one of the most widely used techniques on the web. The uses are innumerable: Analysis of competitors, its SEO positioning, content generation, influencer tracking, etc., etc., etc..

Once the data is extracted, you have to process it, and there, nothing like our good old spreadsheet… In short, Scraper + Google Sheets is a marketers’ paradise

One of the most coveted sites for scraping is of course our dear Google. Scrapping Google allows to:

  • Know the position of your site with regard to several keywords
  • Understand your environment by analyzing the sites that rank best and follow the positions of competitors
  • Analyze the most effective content
  • Understand the keywords and sentence formulations that work best

So here is the ultimate guide to generating a table of more than 500 Google results in Google Sheets. You are then free to use the content according to your needs….

What you will learn …

  • the basic use of IMPORTFROMWEB
  • how to return multiple pieces of information from a Google query
  • How to return several pages of results with several keywords in a single table

Scrape the titles of a search

The basics of scraping include a URL and a path to the element of the page to be extracted

THE URL

To our great happiness (and sometimes misfortune) Google often does things well! Hence, the URL of a search is easily modifiable to adapt it to our needs.

No need to analyze all the optional parameters here, which are also available here as we will focus on 3 parameters:

  • q=keyword+key, indicating the request
  • hl=fr, indicating the language of the results
  • num=100, the number of results to be returned. By default the value is 10

Come on, let’s try them on a keyword that will do our planet some good…

https://www.google.com/search?q=bamboo+toothbrush&hl=en&num=100

The path of the element with XPath

The second step is the description of our element! No need for a Master in engineering here, but you do need to have a basic knowledge of HTML.

XPath is a language for describing an element of an XML script. And the good news is that, HTML being derived from XML, XPaths also apply to our beloved web pages.

A simple example: //h1 indicates all the titles of the page, the double slash indicating that the path is relative so not necessarily a direct child of the document root.

In our case, the XPath will be a little more complex:

//div[@class="g"]//h3

To help you write your XPaths, install XPath Helper for Chrome. My article will help you get started with XPaths

//div[@class="g"] indicates all containers for each result. That is, all “div” of class “g”

With XPath Helper:

And by adding //h3, we grab the title of each container

Are you still following me? Now, let’s compile this in our Google Sheets function

Let’s first install the IMPORTFROMWEB addon from the G Suite Marketplace

Open a spreadsheet …

Pss … other secret by the way, in the Google address bar, just type sheets.new

In the menu, search for Add-ons> IMPORTFROMWEB> Activate Add-on.

Then type

AB
1https://www.google.com/search?q=bamboo+toothbrush&hl=en&num=100
2//div[@class=”g”]//h3
3
4=IMPORTFROMWEB(A1, A2)

Wait a moment …

Note: Google and many sites do not always make life easy for scraping tools.
The engine behind IMPORTFROMWEB will usually find a way to load the page in less than 20 seconds, but in some cases more time is needed.
If you see #PENDING_REQUESTS, open the add-on sidebar through the menu: Add-ons > IMPORTFROMWEB > Open sidebar, then in the Monitor tab click on “Refresh pending requests”

You should now see something like this:

Import the descriptions and result links

IMPORTFROMWEB has the particularity to accept ranges of cells in place of the selector, thus allowing to return more information on our search results.

Using the previous formula:


ABC
1https://www.google.com/search?q=bamboo+toothbrush&hl=en&num=100

2// div[@class=”g”]/ /h3//div[@class=”g”]//h3/../@href@class//div[@=”g”]//div[@class=”slpf”]
3


4=IMPORTFROMWEB(A1, A2:C2)

The baseSelector option

IMPORTFROMWEB accepts a 3rd parameter. This is a range of cells containing options, which makes the function more powerful and flexible.

The list of options is available here.

In our example we will need baseSelector.

To complete our table, let’s assume that we want to return the note given to some results.

We could add the XPath to our range of selectors, like this:

Woe! Since not all the results have a note, the cells are no longer synchronized. For instance, the note in D4 does not correspond to the result of line 4.

The option baseSelector is our remedy!


ABCD
1https://www.google.com/search?q=brosse+%C3%A0+dent+bambou&hl=en&num=100


2//h3//h3/../@href//span [@class=”st”]//div[child::g-review-stars]
3baseSelector@class//div[@class=”g”]






4=IMPORTFROMWEB(A2, A3: D3, A3:B3)


By modifying each XPath and passing //div[@class=”g”] to the baseSelector, we tell the function to search inside of the elements described by the baseSelector.

Let’s see it in action…

As seen before, //div[@class="g"] actually describes the containers for each result.

Hence, after executing the function, we have all the elements in place again:

Return the results of several keywords

As for the selectors, the URL parameter also accepts a range of cells.

Here’s how to get in a few clicks the first 100 results of multiple keywords!

Let’s slightly modify our spreadsheet to generate a list of URLs from Google Search


AB
1bamboo toothbrush=ARRAYFORMULA(“https://www.google.com/search?q=”&SUBSTITUTE(A1:A5,” “,”+”)&”&hl=en&num=100″)
2bamboo toothbrush subscription
3cheap bamboo toothbrush

Bamboo toothbrush review
4Buy bamboo toothbrush

Which should result as follows:

Let’s now reference the range of URLs in our function

=IMPORTFROMWEB(B1:B5, A9:D9, A10:B10)

And here is the result:

All results for each XPath are now concentrated in one cell! This is because IMPORTFROMWEB now processes a 3-dimensional table.

We will have to add another option: stackPages.

By switching stackPages to TRUE, we tell the function that google pages are one single sequence of results and therefore, they must be stacked in the same table.

As our sheet becomes slightly more complex, I have reorganized it for greater clarity. Here is the result:

From now on, we do not know what key word each line refers to.

Another option will help us here: showUrls.

With showUrls set to TRUE, a new column will appear reminding the URL of the page for each line.

Using a good old matrix search method (VLOOKUP, FILTER, INDEX & MATCH), we can return the keywords for each result.

Here is the final result:

Leave a Reply

Your email address will not be published. Required fields are marked *