Simple Usage

Select an empty cell and enter the following:

=IMPORTFROMWEB( "https://www.amazon.com/s/?keywords=iphone", "//h2//text()" )

You can also load this example by opening the builder from the menu: Add-ons > IMPORTFROMWEB > Build function.

Syntax

IMPORTFROMWEB(urls, xpaths/CSS, regexs, replaceBy, options)

urls (cell | range of cells)

(required) Specify the URL(s) that you want to get results from. A single URL can be placed directly in the formula, or referenced one or more cells.

xpaths or CSS selectors (cell | range of cells)

(required) One or a list of XPaths or CSS selectors describing all the elements to retrieve in the pages.

regexs (cell | range of cells) – optional

(optional) One or a list of regular expressions to extract or replace specific parts from the content returned by each XPath.

  • If the next argument replaceBy is not filled or NULL, it will extract some text from the content.
  • If replaceBy has a list of values, they will be used to replace the extracted content.

Each regEx from the list is associated to one XPath from the list of XPaths in such way that the first regEx will extract content from the results of the first XPath, etc.

It works the same way as the JavaScript String match() function

Use regexes in order to clean up your data as its extracted.

Check this article for the most useful regex rules to convert numbers, dates or currencies: Useful regex rules to clean up your imported data

replaceBy (cell | range of cells) – optional

(optional) One or a list of texts to replace the content extracted by the regexs.

Each string of the list is associated to one regex from the list of regexs in such way that the first string will replace the content extracted by the first regex, etc.

Options (table) – optional

A reference to a two-column range. The left column defines option names and the second column contains corresponding values.

Only list the options that you want to override.

countryCodeus, ca, de, fr, es, br, mx, in, jp, cn, au
splitResultsTRUE
stackPagesFALSE
showUrlsFALSE
baseSelectorempty
jsRenderingTRUE
shakeTRUE or FALSE

jsRendering (TRUE | FALSE)

  • TRUE: forces the engine to render the page with Javascript
  • FALSE: Never runs JavaScript.
  • Default is FALSE.

For speed and resource consumption purposes, the engine does not run JavaScript on a page by default. We recommend enabling it only if needed on a web page.

countryCode (us, ca, de, fr, es, br, mx, in, jp, cn, au)

To retrieve localized content, use the code from the following list of countries to load the page from a specific location:

  • us: United States
  • ca: Canada
  • de: Germany
  • fr: France
  • es: Spain
  • br: Brazil
  • mx: Mexico
  • in: India
  • jp: Japan
  • cn: China
  • au: Australia

splitResults (TRUE | FALSE | Character)

When an XPath returns various entries, the results are typically split so they expand in the cells around.

  • TRUE: Results expand naturally in other cells
  • FALSE: All results of an XPath are compressed in one cell.
  • A character: Results are concatenated in one cell and separated by this character.
  • Default is TRUE.

stackPages (TRUE | FALSE)

Whether to stack the results from all the URLs. This is commonly used when fetching multiple pages of a similar search, which will create a continuous list of results.

Default to FALSE.

showUrls (TRUE | FALSE)

Whether to show the URL of the page for each result displayed. It is mainly used with stackPages in order to indicate from which page comes each result.

baseSelector (XPath or CSS selector)

The base selector will be the root from which the other selectors will be evaluated.

It is useful for search results as it ensures that each result will be aligned on its base.

Here in a example:

image.png

In that case, we’ve selected the baseSelector so the function understands that it should retrieve elements relatively to the containers of each search result, hence forcing elements of a same search result to align on the same row (check the ratings on column C)

shake (TRUE or FALSE)

Functions in Google Sheets recalculate only on some conditions. One condition is that it detects a change in the function definition. That is what shake is for!

The shake option is just a way to tell Google Sheets that the function definition has changed and that it has to recalculate. It doesn’t matter if the value is TRUE or FALSE. What matters instead is that it goes from one state to another.

Note that shake does not necessarily fetch the latest data from the target page. It tries beforehand to retrieve the cached data. If the cache is too old (more than 24 hours), then it fetches the target page.

Use it to:

  • “shakes” the function in case the function returned an error (starting with #)
  • retrieve pending requests (as indicated by the #PENDING_REQUEST message)

A convenient use is to couple it with a tick box (Menu > Insert > Tick box)

You should also read…