The IMPORTXML function is native to Google Sheets and can be used to scrape data from some websites and get them directly into your spreadsheet. However, there are some known limitations and issues that can make it a frustrating experience – particularly if you’re not comfortable navigating websites’ source code.
This article will help you understand the limitations of IMPORTXML and introduce a powerful alternative: IMPORTFROMWEB.
Google Sheet’s IMPORTXML formula allows you to import data directly from various structured data types, including XML, HTML, CSV/TSV, and RSS/Atom XML feeds. The formula has three parameters, but only the first two are required:
=IMPORTXML(url, xpath_query, locale)
If you want to retrieve all the links from a particular webpage, you need to know the xPath query that will retrieve them. Using the parameters below, you obtain a list with all the links in Wikipedia’s Moon Landing page:
While the formula itself is deceptively simple, finding the required parameters can be both difficult and time-consuming. You will need to access the website’s source code, find the right HTML element, and then obtain the XPath query.
There are many tools available to help you find the XPath, including Google’s Inspect, but it can be daunting if you’re not familiar with the tool.
However, even if you have a technical background, IMPORTXML has a few limitations. In the next section, you will learn about some of these limitations and how they could affect you.
The biggest limitations to IMPORTXML are as follows:
- Spreadsheets become slow and buggy: If you call the function more than a dozen times in a Sheet, its behaviour becomes very unpredictable. The more formulas you use, the more it slows down the process, as data is refreshed with every change to the Sheet.
- Requires technical skills: the formula only accepts xPath queries for the second parameter, so you need to know how to write these queries. This not only discourages non-technical users, but also limits the flexibility of IMPORTXML.
- Only 1 URL and 1 XPath per formula: with IMPORTXML, you need to use a separate formula for each URL you want to extract data from. Additionally, if you want to extract multiple items from a URL, you need to use a separate formula for each XPath.
Fortunately, the =IMPORTFROMWEB() function is an ImportXML alternative that not only overcomes these limitations, but also adds powerful features and provides ready-made solutions.
Unlike the IMPORTXML formula, IMPORTFROMWEB allows you to:
- Use one formula for everything: extract as much data as you need from a single website with just one IMPORTFROMWEB formula.
- Work seamlessly with hundreds of formulas per sheet, without having to compromise on speed.
- Cache data on demand so that formulas will not be recalculated with every change to the Google Sheet.
- Use both CSS selectors and XPath queries, but if you’re not comfortable delving into the sites’ source code, you can use generic or built-in selectors instead.
- Using 1 formula, extract data from up to 50 URLs and as many selectors as you want.
For example, below you can see how easily headings and subheadings can be retrieved. If you’re mostly interested in retrieving Google data, our IMPORTFROMGOOGLE formula makes this task even easier.
As you have seen, the =IMPORTFROMWEB() function does not have the limitations of IMPORTXML, allowing you to quickly scrape data from any website. More importantly, it offers greatly improved functionality and resources to help automate your scraping tasks.
- More powerful scraping: use the formula hundreds of times per sheet without performance issues and customize your web-scraping with location-based content scraping, regular expressions, and more.
- Ready-made solutions: easily extract data from ready-made templates based on built-in selectors.
- Automation: schedule updates and monitor your data to save you time and manual effort.
Perhaps the greatest advantage of using ImportFromWeb is the fact that you can use it hundreds of times per Sheet without sacrificing performance. Below, you can see how to easily extract Yahoo Finance data for different companies.
With IMPORTXML, data retrieval will fail if the page isn’t accessible in a few seconds. This is a problem, since a small issue can stop you from retrieving data or cause your spreadsheet to freeze when the formula gets stuck.
With ImportFromWeb, by simply keeping the sidebar open, you ensure that this doesn’t happen. If the page isn’t directly accessible, it will try different strategies to retrieve the data. If the data simply isn’t there, it will automatically refresh the formula until it is, without freezing your spreadsheet.
If you’re interested in location-based content scraping, ImportFromWeb allows you to specify the country code to limit your results.
Additionally, you can use regular expressions to fine-tune your scraping, so you can be more specific in your search and clean up the data before import.
The ImportFromWeb add-on provides you with ready-made solutions for websites like Amazon, Google, and Yahoo Finance. With these templates, you can instantly scrape data without having to think about finding the XPaths or even CSS selectors.
For example, imagine you want to retrieve information on a variety of products on Amazon. Access the template from the sidebar in Google Sheets (Extensions > ImportFromWeb > Open Sidebar).
- Choose the store by clicking on the dropdown menu.
- Add the ASINs (Amazon Standard Identification Number) for your chosen products – as many as you like – which can be obtained from the Amazon product URL.
- Use the dropdown to choose a built-in selector. You can add as many columns/selectors as you wish.
- Your results are ready for you:
Not only are you able to import data quickly and easily, but you can also configure some great features in the Monitor tab in the sidebar. You can easily configure and schedule data updates, as well as monitor specific cells.
Choose the frequency with which you want to update – hourly, daily, weekly – and your data will be refreshed even if you are offline.
|Buggy when called more than dozen times in a spreadsheet||Works seamlessly with hundreds of formulas per sheet|
|The more formulas you use, the slower it becomes||Lightning quick despite many formulas|
|Refreshes formulas when the spreadsheet changes||Caches data on demand, so the formulas don’t recalculate with changes|
|Doesn’t scrape many websites||Ability to scrape even complex websites|
|Very basic scraping functionalities||Many options for powerful web scraping, such as country codes, regular expressions, base selectors, and more.|
|Requires knowledge of XPaths||Works with CSS Selectors or XPaths. CSS Selectors are accessible if you know basic HTML. And if you don’t we offer many tips, templates and resources to guide you through the scraping process.|
IMPORTFROMWEB, on the other hand, is a great alternative to IMPORTXML, overcoming these limitations and adding great functionalities. Additionally, it provides ready-made solutions for popular websites, which don’t require technical knowledge, but will allow you to scrape all the data you need.
Interested in using ImportFromWeb to extract data from popular websites? Check out our ready-to-use solutions to scrape from Google, Amazon or Yahoo Finance.