In this article, I will be showing the process of scraping some listed companies market capitalization in Malaysia data using Google Sheet. We will perform web scraping on the i3 Investor site.
Images from Unsplash
Generate Webpage for Scraping
First of all, open a new google sheet and create a table like this:
Create a table like this
Inside the table, we have a few companies name and their listed code. Notice that in cell C3, we put a link — this link will serve as a “prefix”. If the stock code is put at the back of the link, it will direct to the webpage of the particular page.
prefix link: https://klse.i3investor.com/web/stock/overview/
link to webpage: https://klse.i3investor.com/web/stock/overview/1023
Now, we use the concatenate function to append the code to the prefix links for all the companies in the table:
Use CONCAT() function to create the links
Web Scraping in Action
Before we start web scraping, we need to learn about this function — IMPORTXML().
=importxml("url", "query")
Notice that we already have all the URLs needed as we have created the links on the table. Now, the missing piece is called “query” — simply means what do we want to know and where can it be found?
The query is called the XPath where it is used in web browser, now let’s hope into Maybank stock page and get the XPath query.
Highlight the market capitalization amount
After clicking inspect, a window will pop out highlighting a code segment. Here, we need to right click and select Copy full XPath.
Click copy full XPath
Let’s hope back to Google Sheet, you will be pasting the query as follow:
=importxml("url","/html/body/div[3]/div/div[2]/div[8]/div[1]/div[2]/div[1]/div[2]/p/strong")
Note: The URL will be the URLs in the Reference column
After applying the formula to the Market Cap column, you will manage to scrape all the Market Capitalization data on the sheet — the data will be updated live.
Data will be loaded once your apply the formula