Install this add-on for Free from the G Suite Marketplace

Simple Usage

Select an empty cell and try:

=IMPORTFROMWEB( "https://www.amazon.com/s/?keywords=iphone", "//h2//text()" )

You can also run the example by opening the builder from the menu: Add-ons > IMPORTFROMWEB > Build function

Syntax

IMPORTFROMWEB(urls, xpaths, regexs, replaceBy, options)

urls (String|Range)

One or a list of URLs of the pages to retrieve

xpaths (String|Range)

One or a list of XPaths describing all the elements to retrieve in the pages

regexs (String|Range)

(optional) One or a list of regular expressions to extract or replace specific parts from the content returned by each XPath.

  • If the next argument replaceBy is not filled or NULL, it will extract some text from the content
  • Otherwise, if replaceBy has a list of values, it will replace the extracted content by the values from replaceBy.

Each regEx from the list is associated to one XPath from the list of XPaths in such way that the first regEx will extract content from the results of the first XPath, etc…

It works the same way as the Javascript String match() function

Use regexes in order to clean up your data.

Check this article for the most useful regex rules to convert numbers, dates or currencies

Useful regex rules to clean up your imported data

replaceBy (String|Range)

(optional) One or a list of texts to replace the content extracted by the regexs.

Each string of the list is associated to one regex from the list of regexs in such way that the first string will replace the content extracted by the first regex, etc…

Options (Range)

A reference to a two-columns range. The left column containing the options names and the second column the options values.
Only list the options that you want to override.

splitResultsTRUE
stackPagesFALSE
jsRenderingTRUE
forceRefreshFALSE

splitResults (Boolean | Char)

When an XPath returns various entries, the results are typically split so they expand in the cells around.

  • TRUE: Results expand naturally in other cells
  • FALSE: All results of an XPath are compressed in one cell.
  • A character: Results are concatenated in one cell and separated by this character.
  • Default is TRUE.

StackPages (Boolean)

Whether to stack the results from all the urls.
Useful in case of retrieving various pages of a search as it will create a list of results
Default to FALSE.

jsRendering (Boolean | “force”)

For speed and resource consumption purpose, the engine does not run javascript on a page if it is not necessary. It is usually smart enough to detect if a page needs to be loaded with Javascript or not.
However this behaviour can be overridden.

  • TRUE: Let the engine decides when a page needs to be
  • force: force the engine to render the page with Javascript
  • FALSE: Never runs Javascript
  • force: Always run Javascript.

In order to speed up the function. Keep in mind that, if the target page loads Javascript, the time for retrieving the data is much slower. Hence it can be useful to keep the control whether we want to render Javascript or not.
Default is TRUE.

forceRefresh (Boolean)

IMPORTFROMWEB caches the results in a smart way to avoid redundant recalculations. The downside is that the data can be up to 6 hours old.
If TRUE, the function will recalculate when any change occurs to the sheet.
Default to FALSE

You should also read…