Install this add-on for Free from the G Suite Marketplace

Also available in the Chrome Store

Simple Usage

Select an empty cell and try:

=IMPORTFROMWEB( "https://www.amazon.com/s/?keywords=iphone", "//h2//text()" )

You can also run the example by opening the builder from the menu: Add-ons > IMPORTFROMWEB > Build function

Syntax

IMPORTFROMWEB(urls, xpaths, regexs, replaceBy, options)

urls (cell | range of cells)

One or a list of URLs of the pages to retrieve

xpaths (cell | range of cells)

One or a list of XPaths describing all the elements to retrieve in the pages

regexs (cell | range of cells) – optional

(optional) One or a list of regular expressions to extract or replace specific parts from the content returned by each XPath.

  • If the next argument replaceBy is not filled or NULL, it will extract some text from the content
  • Otherwise, if replaceBy has a list of values, it will replace the extracted content by the values from replaceBy.

Each regEx from the list is associated to one XPath from the list of XPaths in such way that the first regEx will extract content from the results of the first XPath, etc…

It works the same way as the Javascript String match() function

Use regexes in order to clean up your data.

Check this article for the most useful regex rules to convert numbers, dates or currencies

Useful regex rules to clean up your imported data

replaceBy (cell | range of cells) – optional

(optional) One or a list of texts to replace the content extracted by the regexs.

Each string of the list is associated to one regex from the list of regexs in such way that the first string will replace the content extracted by the first regex, etc…

Options (table) – optional

A reference to a two-columns range. The left column containing the options names and the second column the options values.
Only list the options that you want to override.

countryCodeus, ca, de, fr, es, br, mx, in, jp, cn, au
splitResultsTRUE
stackPagesFALSE
jsRenderingTRUE

countryCode (us, ca, de, fr, es, br, mx, in, jp, cn, au)

Use a code form the following list of countries to load the page from a specific location

  • us: United States
  • ca: Canada
  • de: Germany
  • fr: France
  • es: Spain
  • br: Brazil
  • mx: Mexico
  • in: India
  • jp: Japan
  • cn: China
  • au: Australia

splitResults (TRUE | FALSE | Character)

When an XPath returns various entries, the results are typically split so they expand in the cells around.

  • TRUE: Results expand naturally in other cells
  • FALSE: All results of an XPath are compressed in one cell.
  • A character: Results are concatenated in one cell and separated by this character.
  • Default is TRUE.

stackPages (TRUE | FALSE)

Whether to stack the results from all the urls.
Useful in case of retrieving various pages of a search as it will create a list of results
Default to FALSE.

showUrls (TRUE | FALSE)

Whether to show the URLs of the page for each result displayed.
It is mainly used with stackPages in order to indicate from which page comes each result.

baseSelector (An XPath or CSS selector)

The base selector will be the root from which the other selectors will be evaluated.
It is useful for search results as it insures that each result will be aligned on its base.

jsRendering (TRUE | FALSE | force)

  • TRUE: forces the engine to render the page with Javascript
  • FALSE: Never runs Javascript

For speed and resource consumption purpose, the engine does not run javascript on a page by default.