Google Sheets Web Scraping - Simple Guide for (2024)

Google Sheets has some kick-ass functions that can do a lot of heavy lifting.

As a part of my online work, I often need to go to some websites and extract specific data points or sections from them.

For example, I sometimes have to copy and paste a table from a web page into Google Sheets or fetch the article title or product name from a web page into Google Sheets.

Since I spend a lot of my time in Google Sheets anyway, I thought of figuring out if I could scrape the data from the websites and extract the data into the cells in Google Sheets.

After a few hours of scanning content on various forums and YouTube, I was able to create a basic scrapper in Google Sheets (for my needs).

In this article, I want to show you how you can use simple Google Sheets formulas for Google Sheets web scraping

Table of Contents

Note that Google Sheets is not a scraper tool. So while we can do some basic website scraping with it, if you need something more robust, I recommend you use better options such as Python or dedicated scrapping tools.

Before I show you some examples of how to scrape data in Google Sheets, let’s first have a look at some formulas that make it possible

Important Formulas for Google Sheets Web Scraping

Below are web scrapping formulas that we would be covering in this tutorial:

IMPORT XML formula

With the IMPORTXML formula, you can fetch the data from many structured data types such as XML, HTML, CSV/TSV, and RSS/ATOM XML feeds.

Don’t worry if you feel a bit lost here.

You’ll get a much better idea of how this works when we go to some examples and I show you how to use IMPORTXML for scrapping webpage titles or specific sections such as date or author name.

IMPORTXML formula has the below syntax:

IMPORTXML(url, xpath_query)

where:

  • ‘url’ is the URL of the web page from which you want to scrape the data
  • ‘xpath_query’ is the identifier that tells the formula what to scrape. For example, if you want to scrape the title of an article, you will use the query that tells the formula what part of the webpage code refers to the title.

IMPORTHTML Formula

With IMPORTHTML, you can fetch either a table or a list from a webpage.

In most cases, I use this when there is a table on a webpage that I want to fetch. In case there are multiple tables, you can specify which table to scrape.

IMPORTHTML formula has the below syntax:

IMPORTHTML(url, query, index)

where:

  • ‘url’ is the URL of the web page from which you want to scrape the data
  • ‘query’can be a “list” or a “table”, based on what you want to extract
  • index is the number that will tell Google Sheets which table or list to fetch

IMPORTFEED Formula

While IMPORTXML and IMPORTHTML are meant to be used with regular URLs, the IMPROTFEED function is specifically designed to deal with RSS or Atom feeds.

A use-case of this could be when you want to fetch the top 10 article titles using the RSS feed of your favorite blog or news website.

IMPORTFEED formula has the below syntax:

IMPORTFEED(url, [query], [headers], [num_items])
  • url – this is the URL of the ATOM or RSS feed of the blog/website
  • [query] – this is an optional argument where you tell the formula the data point you want to fetch (such as the title of the post or the date of the post). If you don’t specify anything, it will fetch everything
  • [headers] – this is an optional argument where you can make this TRUE if you want an additional row that shows the header
  • [num_items] – this is also an optional argument where you specify how many items you want to fetch. For example, if you’re fetching the latest posts from a website using the RSS feed, then you can specify 5 here and the five latest posts would be displayed.

Now that we have covered all the Google Sheets formulas you need to know to scrape the websites, let’s dive in and have a look at some of the web scraping examples.

Similar Formulas

Alongside scraping websites, you may want to import data from other sources.

You can use the IMPORTDATA to bring data over from other sources including separate spreadsheets

You could also use IMPORTRANGE to port over a section from another spreadsheet.

There are also ways to import JSON files into Sheets.

We won’t dive too deeply into these, but you can follow the links to learn more about them.

How to Scrape Data From a Website Into Google Sheets

Web scraping with Google Sheets can be accomplished with a number of different methods depending on what info you want to scrape. Google Sheets can pull data from a website using any of the above formulas. Below are a few examples of how you can use the web scraper Google Sheets functions.

Extract Data From a Website to Google Sheets: Tables

Let’s say you want to scrape the table from Wikipedia about the top-grossing movies of all time.

Google Sheets Web Scraping - Simple Guide for (1)

Here is the URL of the page that has this table –https://en.wikipedia.org/wiki/List_of_highest-grossing_films

Here’s how you would import this table into your spreadsheet:

  1. Click an appropriate empty cell
  2. Enter =IMPORTHTML(
  3. Enter the URL inside quotation marks
  4. After a comma, enter the query inside quotes. In this case, it’s “table”
  5. Enter another comma then the index to fetch, in this example, it’s1
  6. Press enter to execute the fucntion

The full formula should look something like this:

=IMPORTHTML("https://en.wikipedia.org/wiki/List_of_highest-grossing_films","table",1)

Google Sheets Web Scraping - Simple Guide for (2)

The above formula goes to the specified URL, and from the HTML code of the web-page, it identifies the first table and fetches the entire table in Google Sheets.

Quite impressive to be honest.

Note that Google sheet returns an array of values, and for this formula to work you need to have the entire range of cells free and available for the IMPORTHTML formula to fill that area.

If you already have something in one of the cells (a cell that would otherwise have been used by the result of the ImportHTML formula) you would see an error.

Now, this is great.

But what if I do not want the entire table.

What if I only want the names of the top-grossing movies.

You can do that as well.

Since the formula returns an array of values, you can use this as an input within the INDEX function and only fetch the column that you need.

Below is the formula that would only give you the names of the top-grossing movies from the table.

=INDEX(IMPORTHTML("https://en.wikipedia.org/wiki/List_of_highest-grossing_films","table",1),,3)

Google Sheets Web Scraping - Simple Guide for (3)

The above INDEX formula only fetches the third column, which has the movie names.

Just the way I have used the formula to scrap the table, you can also use it to fetch a list as well. Just change the second argument from “table” to “list”.

Note: This formula would refresh when you open the Google Sheets document again later. This way, in case there is any change in the list, it would automatically be updated. If you just need the table and don’t want it to update, copy the data, and paste it as values.

This is just one of the ways Google Sheets can get data from a website. Keep reading to learn more.

How to Import Data From Website to Google Sheets: Titles

Here is a random article I picked from Forbes.

Google Sheets Web Scraping - Simple Guide for (4)

Now I want to use the formulas to scrape the headline of this article.

This could be useful when you have a list of 50 or 100 URLs and you want to quickly fetch the titles.

I have entered this URL in cell A2 (although you can also use the URL directly in the formula, I will use the cell reference that has the URL instead).

Now before I use the formula, I need two things for it to work – the exact URL (or the cell reference that contains the URL), and an identifier within the HTML of that page that tells me that this is the title.

Let’s see how you can find this HTML identifier.

  1. Open the webpage and when it’s fully loaded, hover the cursor over the headline, right-click and then click on Inspect.Google Sheets Web Scraping - Simple Guide for (5)This opens the inspect element window and you will see that the following HTML element gets highlighted: <h1 class=”fs-headline speakable-headline font-base font-size”>Google’s Machine Learning Is Making You More Effective In 2020</h1>
    Google Sheets Web Scraping - Simple Guide for (6)
  2. From this, we need an identifier that will tell us that it’s the title and that identifier is class=”fs-headline speakable-headline font-base font-sizeNow, we can use this in our formula and get the title of the URL.
  3. Add the identifier as a @class= argument to scrape the title of the given URL:

In our example, the final formula should go as follows:

=IMPORTXML(A2,"//*[@class='fs-headline speakable-headline font-base font-size']")

Google Sheets Web Scraping - Simple Guide for (7)

Similarly, if you want to get the name of the author of this article, you can use the same steps above to find out the HTML element that is a unique identifier for the author name and use the below formula:

=IMPORTXML(A2,"//*[@class='contrib-link--name remove-underline']")

Google Sheets Web Scraping - Simple Guide for (8)

Note: I wish this was a standardized process where you would always have the same ‘title’ identifier or the ‘author name’ identifier for all the websites across the Internet. Unfortunately, every website is made in a different way, and therefore the HTML coding is different for all of these. So you need to spend some time to get that identifier and then use it in your IMPORTXML formula. This is again one of the reasons why I say that while Google Sheets can scrape a website, it’s not meant for it.

Scrape the Recent Articles from a Feed

If you have the feed URL of a website, you can also use the IMPORTFEED formula to fetch all the recently published articles from the feed.

For example, below is an example of the feed from Tech Crunch

https://techcrunch.com/feed

I can now use this feed to get the latest articles from TechCrunch.

Below is the formula that will fetch all the data from the feed:

=IMPORTFEED("https://techcrunch.com/feed",,TRUE)

Google Sheets Web Scraping - Simple Guide for (9)

Since TechCrunch has the top 20 articles in the feed, you will get information such as the title of the article, author, URL, date, summary, etc.

If you only want the titles of the recent articles, you can use the below formula:

=IMPORTFEED("https://techcrunch.com/feed","items title",TRUE)

Google Sheets Web Scraping - Simple Guide for (10)

and if you want the URLs, you can use the below formula:

=IMPORTFEED("https://techcrunch.com/feed","items url",TRUE)

Google Sheets Web Scraping - Simple Guide for (11)

You can use this to create your own feed list of the top blogs/sites you want to keep a track of. You can have the feeds of these sites in separate columns and use the formula to always have a list of the latest articles from the website.

This can also be a great technique to keep a track of what your competitors are posting.

Scraping Data From Amazon

Now I want to show you something amazing.

Using Google Sheets formulas, you can scrape data from Amazon such as the product title, the rating, the total number of reviews, etc.

Here is the URL of a product that I will be using – https://www.amazon.com/Microsoft-Ergonomic-Keyboard-Business-5KV-00001/dp/B00CYX26BC/

Google Sheets Web Scraping - Simple Guide for (12)

It’s a URL to an ergonomic keyboard.

Now let’s see the formulas you can use to scrape the data from Amazon using Google Sheets.

=IMPORTXML(A2,"//*[@id='productTitle']")

Google Sheets Web Scraping - Simple Guide for (13)

Since the identifier used on Amazon is ‘productTitle’, you can use the URL of any product page and this will scrape the product title.

Below is the formula that will give you the price of the product:

=IMPORTXML(A2,"//*[@id='priceblock_ourprice']")

Google Sheets Web Scraping - Simple Guide for (14)

And the below formula will give you the number of customer ratings:

=INDEX(importXml(A2,"//*[@id='acrCustomerReviewLink']"),1,1)

Google Sheets Web Scraping - Simple Guide for (15)

The concept used in these formulas is the same that I have already covered before.

You can try and scrape more data from Amazon or any website you like. Since there is no one size that fits all, you may have to do a little bit of trial and error before you can get the formula that works.

These formulas work at the time of writing this article. In the future, if Amazon changes the layout of their product pages, you may have to adjust the formulas.

How to Scrape: H1, Title, Description, and Page Links

You can easily scrape different sections of a webpage by using XPath queries with the IMPORTXML function. For example, if you wanted to pull the description metadata from a page, you could use a formula such as this:

=IMPORTXML(“https://www.spreadsheetpoint.com/”,”//meta[@name=’description’]/@content”)

Here are some of the most common queries you would use when importing data from a website:

  • H1: //h1
  • Title: //title
  • Page links: //@href
  • Meta description: //meta[@name=’description’]/@content

Web Scrape Multi-Author Articles

Let’s pretend you need to scrape a page for authors and there happens to be more than one. You can do so by combining INDEX and IMPORTXML. INDEX is an array function that searches for each index position in a list of items. Here’s how to use Google Sheets to scrape a website for multiple authors:

  1. Type =INDEX(IMPORTXML(
  2. Select the cell with the URL of the article from the spreadsheet or type the URL manually. Let’s pretend we have the URL in C2 for our example
  3. Enter the Auther search parameter of “//span[@class=’byline-author’]”
  4. Use a closing bracket
  5. Choose the index position of the the author you wish to search for (1 for the first, 2 for the second, etc)
  6. Use another closing bracket and press enter to execute the scrape
  7. Repeat in another empty cell for each author, ie if you used 1 as the index the first time, use 2 to search for the second author and so on.

The first formula should look something like this:

=INDEX(IMPORTXML(C2,"//span[@class='byline-author']"),1)

Then the second formula should be

=INDEX(IMPORTXML(C2,"//span[@class='byline-author']"),2)

And so on…

Web Scraping Google Sheets FAQ

Can You Web Scrape in Google Sheets? / How Do I Scrape Website Data in Google Sheets?

Yes, you can! you can use one of the many IMPORT functions to do so. You would usually use the IMPORTHTML function but it can differ based on what you want to scrape. The above guide goes through several ways you use a Google Sheets scraper.

Is Web Scraping a Crime?

Web scraping with Google Sheets or any other software is perfectly legal. However, you should never scrape confidential data or personal information to sell.

Can Google Sheets Pull Data From an API?

Yes, but you’ll have to familiarize yourself with Google Script or use an add-on.

Can I Use Google Sheets as a Database for Website?

Yes, you can pull data from a Google spreadsheet to a website by using Google Apps Script.

Limitations of Google Sheets as Scraper

As I mentioned earlier, Google Sheets is not built for scraping websites. Sure it has some awesome formulas that you can use for scraping, but there are a lot of limitations you need to know about.

For example, if you’re trying to scrape a couple of tables or a couple of HTML elements, it should be fine.

But if you ask Google Sheets to do the scraping for hundreds of data points at once, you may have to wait for a long time (minutes or even hours), or worse – see an error message because Google Sheets is not able to do the scraping.

In my experience, as soon as I try and scrape more than 50 data points in one go, I face issues. Sometimes, I enter the formula and nothing happens – it shows a blank.

I’m not sure exactly what’s going on in the backend, but I feel there is a daily limit to these formulas. Google Sheets allows you to scrape 50 or 100 data points, but once you go above that limit, it stops you from doing it for some time (maybe a few hours or a day)

So this is how you can use Google Sheets as a basic web scraper and fetch the data from websites.

I hope you found this Google Sheets web scraping tutorial useful!

Other Google Sheets tutorials you may also like:

  • Google Sheets Array Formulas
  • How to Use IMPORTRANGE Function in Google Sheets (Examples)
  • REGEXMATCH Function in Google Sheets (Easy Examples)
  • Google Sheets QUERY Function
  • How to Insert an Image in a Cell in Google Sheets
Google Sheets Web Scraping - Simple Guide for (2024)

FAQs

Is web scraping Google legal? ›

Scraping publicly available data is thus permitted by the law. Scraping is legal. The legality of web scraping is clear. On Google Maps as well as on any set of data publicly available online.

How do I pull data from Google Sheets? ›

Right click on the cell or cells. Data extraction . Click Data > Data extraction .

What are 3 things you can use Google Sheets to do? ›

Work with your data
  • Copy formatting from any text and apply it to another selection of text.
  • Format data as currency, a percentage, change decimal places, and more.
  • Add or edit cell borders.
  • Merge cells.
  • Change text alignment.
  • Change how text wraps or rotate text.

Can I be sued for web scraping? ›

If the scraping activity is too intense which can interrupt the services of the scraped website or the scraped data is used in a way to duplicate the activity or the service of that website, then even though regulations don't exist, the website would have grounds to file a lawsuit against the scraper.

Can you get banned for scraping? ›

Making too many requests to a website in a short amount of time can lead to a ban. Implement a delay between your requests to mimic human browsing behavior and reduce the chances of detection. This is a simple yet effective way to avoid getting blocked by the website you are scraping.

Can Google Sheets scrape data? ›

Fortunately, Google Sheets offers a user-friendly solution for scraping data from websites without needing to write complex code. By leveraging the power of Google Sheets, you can easily extract data from webpages and analyze it in a variety of ways.

Does Google Sheets have an API? ›

The Google Sheets API is a RESTful interface that lets you read and modify a spreadsheet's data. The most common uses of this API include the following tasks: Create spreadsheets. Read and write spreadsheet cell values.

Is Google Sheets API free? ›

All use of the Google Sheets API is available at no additional cost.

What can Google Sheets not do that Excel can? ›

What can Excel do that Google Sheets can't? Excel offers some advanced features that set it apart from Google Sheets, like built-in advanced statistical tests, better data analysis tools, more keyboard shortcuts, and the ability to import data from more external sources.

Is Google Sheets better than Excel? ›

If you're trying to figure out which spreadsheet software is best for your needs, consider how you plan to use it. Google Sheets is a good choice if you're looking for basic spreadsheet functions. If you need to upload, track, and analyze large sets of data, Microsoft Excel is ideal.

Why would someone use Google Sheets? ›

With Google Sheets, you can create and edit spreadsheets directly in your web browser—no special software is required. Multiple people can work simultaneously, you can see people's changes as they make them, and every change is saved automatically.

Is web scraping illegal? ›

While web scraping is not inherently illegal, how it is conducted and the data's subsequent use can raise legal and ethical concerns. Actions such as scraping copyrighted content and personal information without consent or engaging in activities that disrupt the normal functioning of a website may be deemed illegal.

What is the easiest language for web scraping? ›

Python is widely considered to be the best programming language for web scraping. That's because it has a vast collection of libraries and tools for the job, including BeautifulSoup and Scrapy. Also, Python's simple syntax makes it a great choice for beginners.

What is the difference between web scraper and data scraper? ›

Web scraping is when you take any publicly available online data and import the found information into any local file on your computer. The main difference here to data scraping is that web scraping definition requires the internet to be conducted.

How can I pull data from one Google sheet to another automatically? ›

Copy data from one sheet to another automatically

Click on the cell where you wish your data to be located, and enter the IMPORTRANGE formula, inserting the URL you copied where needed. You can also choose to specify the data range within the formula.

How to pull data from one Google sheet to another based on criteria? ›

Google Sheets link to another sheet based on criteria

You can do this using the FILTER function that was featured in the example above. Here is the syntax: =FILTER(data_set,criterium1, criterium2,...)

How do I pull data from Google Sheets to Excel? ›

How to convert a Google Sheets file to Excel
  1. On your PC or Mac, open the Google Sheets spreadsheet that you want to convert.
  2. In the top-left corner, click "File." Open the "File" menu. ...
  3. In the drop-down menu, hover over the option that says "Download."
  4. In the new menu that appears, click "Microsoft Excel (.
Nov 3, 2022

How do I export specific data from Google Sheets? ›

Choose tab and export data

Under the Export data section, click Select data. As you can see below, our open file is automatically selected. Now, choose the tab you wish to export. In this example, I want to export Garry's performance data, so I'll select his tab.

Top Articles
Is Chronic Sinusitis Hereditary? - American Sinus Institute
Docking Station Alternative Solutions: Creating a Productive Workstation is as Easy as 1-2-3! | StarTech.Blog
Durr Burger Inflatable
Hotels Near 625 Smith Avenue Nashville Tn 37203
Safety Jackpot Login
O'reilly's Auto Parts Closest To My Location
What Are the Best Cal State Schools? | BestColleges
Erika Kullberg Wikipedia
Hocus Pocus Showtimes Near Harkins Theatres Yuma Palms 14
Math Playground Protractor
Health Benefits of Guava
Women's Beauty Parlour Near Me
Planets Visible Tonight Virginia
William Spencer Funeral Home Portland Indiana
Saw X | Rotten Tomatoes
Rainfall Map Oklahoma
Craigslist Farm And Garden Tallahassee Florida
Becu Turbotax Discount Code
I Touch and Day Spa II
Booknet.com Contract Marriage 2
Puss In Boots: The Last Wish Showtimes Near Cinépolis Vista
Magic Seaweed Daytona
Toothio Login
Boise Craigslist Cars And Trucks - By Owner
Craigslist Wilkes Barre Pa Pets
Receptionist Position Near Me
Ullu Coupon Code
Shiny Flower Belinda
Xxn Abbreviation List 2023
Shoe Station Store Locator
Our Leadership
Mega Millions Lottery - Winning Numbers & Results
Joplin Pets Craigslist
Breckie Hill Fapello
Goodwill Houston Select Stores Photos
Oreillys Federal And Evans
School Tool / School Tool Parent Portal
Plead Irksomely Crossword
Thelemagick Library - The New Comment to Liber AL vel Legis
Sas Majors
Craigslist en Santa Cruz, California: Tu Guía Definitiva para Comprar, Vender e Intercambiar - First Republic Craigslist
Craigslist Odessa Midland Texas
Busted Newspaper Mcpherson Kansas
Craigslist Central Il
Random Animal Hybrid Generator Wheel
Thothd Download
Oefenpakket & Hoorcolleges Diagnostiek | WorldSupporter
2121 Gateway Point
Ocean County Mugshots
Latest Posts
Article information

Author: Greg Kuvalis

Last Updated:

Views: 6644

Rating: 4.4 / 5 (75 voted)

Reviews: 82% of readers found this page helpful

Author information

Name: Greg Kuvalis

Birthday: 1996-12-20

Address: 53157 Trantow Inlet, Townemouth, FL 92564-0267

Phone: +68218650356656

Job: IT Representative

Hobby: Knitting, Amateur radio, Skiing, Running, Mountain biking, Slacklining, Electronics

Introduction: My name is Greg Kuvalis, I am a witty, spotless, beautiful, charming, delightful, thankful, beautiful person who loves writing and wants to share my knowledge and understanding with you.