Web Scraping

Web Scraping with Go

Web scraping (Wikipedia entry) is a handy tool to have in your arsenal. It can be useful in a variety of situations, like when a website does not provide an API, or you need to parse and extract web content programmatically. This tutorial walks through using the standard library to perform a variety of tasks like making requests, changing headers, setting cookies, using regular expressions, and parsing URLs. It also covers the basics of the goquery package (a jQuery like tool) to scrape information from an HTML web page on the internet.

If you need to reverse engineering a web application based on the network traffic, it may also be helpful to learn how to do packet capture, injection, and analysis with Gopacket.

If you are downloading and storing content from a site you scrape, you may be interested in working with files in Go.

Web Scraping Tutorial in JavaScript (Node.js)

Node.js is, according to their website, "a platform built on Chrome's JavaScript runtime for easily building fast, scalable network applications. Node.js uses an event-driven, non-blocking I/O model that makes it lightweight and efficient, perfect for data-intensive real-time applications that run across distributed devices." It is essentially a javascript interpreter for the command line. With Node.js, you can write scripts in JavaScript just like you would with PHP and Python.