ImportFromWeb is the easiest way to start scraping data from any website. Find out how you can start filling your new blank sheet
Type =IMPORTFROMWEB(). If #NAME! is displayed, have a look at our FAQ:
When I write =IMPORTFROMWEB() in a cell, it shows the error #NAME?
Importing data is as easy as typing
=IMPORTFROMWEB([the url you want to search], [an XPath to describe the content you want to import])
In the Google Sheets menu, go to
Add-ons ▶︎ ImportFromWeb | Easy Web Scraping ▶︎ 🛠 Open sidebar.
It will load a sidebar on the right side giving you a lot of support to create your function and discover all the capabilities.
IMPORTFROMWEB(urls, xpaths/CSS, options)
For advanced users, ImportFromWeb lets you manipulate the content with regular expressions. it’s especially useful to clean up text, replace commas in numbers or remove unwanted characters
IMPORTFROMWEB(urls, xpaths/CSS, regexps, replaceBy, options)
(required) Specify the URL(s) that you want to get results from. A single URL can be placed directly in the formula or use a reference to one cell or a range of cells.
(required) One or a list of XPaths or CSS selectors describing all the elements to retrieve in the pages.
(optional) One or a list of regular expressions to extract or replace specific parts from the content returned by each XPath.
- If the next argument replaceBy is not filled or NULL, it will extract some text from the content.
- If replaceBy has a list of values, they will be used to replace the extracted content.
Each regEx from the list is associated to one XPath from the list of XPaths in such way that the first regExp will extract content from the results of the first XPath, etc.
Use regexps in order to clean up your data.
Check this article for the most useful regex rules to convert numbers, dates or currencies: Useful regex rules to clean up your imported data
(optional) One or a list of texts to replace the content extracted by the regexps.
Each string of the list is associated to one regex from the list of regexs in such way that the first string will replace the content extracted by the first regex, etc.
A reference to a two-column range. The left column defines option names and the second column contains corresponding values.
Only list the options that you want to override.
Some options might be limited or unavailable according to your subscription. Please check out our plans for more details
- Default is FALSE.
To retrieve localized content, use the code from the following list of countries to load the page from a specific location:
- us: United States
- ca: Canada
- de: Germany
- fr: France
- es: Spain
- br: Brazil
- mx: Mexico
- in: India
- jp: Japan
- cn: China
- au: Australia
Google Sheets controls how the functions recalculate according to different behaviours (modifying cells, opening the spreadsheet, …)
To avoid that IMPORTFROMWEB fetches the target page constantly, the content is cached 24 hours by default. Therefore, whenever IMPORTFROMWEB recalculates, the content is retrieved from the cache.
You can customize the cache lifespan by setting the cacheLifespan option to the number of hours you will want the cache to last.
Bear in mind that the function will not necessarily update directly after the cache expires but only when Google Sheets lets the function recalculate.
A new request will be counted whenever the function has to fetch the content from the target page.
When an xPath returns various entries, the results are typically split so they expand into the surrounding cells.
- TRUE: Results expand naturally in other cells
- FALSE: All results of an XPath are compressed in one cell.
- A character: Results are concatenated in one cell and separated by this character.
- Default is TRUE.
Users may be tempted to use it to display a table. As of now, it will not work. In order to display a table, It is necessary to indicate the path for each column (like //table/tr, //table/tr, //table/tr[3)])
Whether to stack the results from all the URLs. This is commonly used when fetching multiple pages of a similar search, which will create a continuous list of results.
Default is FALSE.
Here is an example that extracts a thousand results from a Google search:
Whether to show the URL of the page for each result displayed. It is mainly used with stackPages in order to indicate from which page comes each result.
Default is FALSE
The base selector will be the root from which the other selectors will be evaluated.
It is useful for search results as it ensures that each result will be aligned on its base.
Here in a example:
In that case, we’ve selected the baseSelector so the function understands that it should retrieve elements relatively to the containers of each search result, hence forcing elements of a same search result to align on the same row (check the ratings on column C)
Functions in Google Sheets recalculate only on some conditions. One condition is that it detects a change in the function definition. That is what shake is for!
The shake option is just a way to tell Google Sheets that the function definition has changed and that it has to recalculate. It doesn’t matter if the value is TRUE or FALSE. What matters instead is that it goes from one state to another.
Note that shake does not necessarily fetch the latest data from the target page. It tries beforehand to retrieve the cached data. If the cache is too old (more than 24 hours) or the function displayed an error, then it fetches the target page.
Use it to:
- “shakes” the function in case the function returned an error (starting with #)
- retrieve pending requests (as indicated by the #PENDING_REQUEST message)
A convenient use is to couple it with a tick box (Menu > Insert > Tick box)
Set to TRUE, it will output #ERROR! when an error occurs.
By default the function returns custom messages when an error occurs (such as #SELECTOR_NOT_VALID). Those messages are not recognized by Google Sheets as proper errors. Therefore you cannot use the ISERROR or IFERROR functions.
Turn outputErrors to TRUE in order to tell the function to return a Google Sheets standard error that can be handled with the ISERROR or IFERROR functions.
The downside of it is that the #ERROR! message displayed by Google Sheets doesn’t tell much about the type of error. Pass the mouse over the cell to see more information about the error.
Default is FALSE
Use this option if the source code is plain XML instead of HTML.
Default is FALSE
IMPORTFROMWEB_getMeta() is an additional function to IMPORTFROMWEB() available within the same add-on.
It takes as reference a cell that contains an IMPORTFROMWEB() function and returns additional information about the results.
IMPORTFROMWEB_getMeta() allows to monitor the cells containing an IMPORTFROMWEB() function.
It gives similar information to what can be seen in the Monitor tab from the sidebar.
A1 being a cell that contains IMPORTFROMWEB()
The output is a range of two columns with the metadatas’ keys in the 1st column and their respective values in the 2nd column, like
A reference to a cell containing the IMPORTFROMWEB() function
If filled, returns only the value corresponding to the metadata key
If FALSE and metaName is not filled, returns all the results compressed in one cell
Default to TRUE
A JSON object representing the results of the function
The date of the last update
The time of the last update
updateDate and updateTime are particularly useful to control if the results returned are fresh
Returns the url of an HTML version of the page as seen by the engine.
This information is useful for debugging.
ImportFromWeb functions can be executed automatically even though the spreadsheet is closed. Setting up an automatic trigger can be done through the sidebar.
Only available in Premium and High Volume plans.