How to Parse HTML DOM with PHP



PHP Simple HTML DOM is a one-file library that lets you traverse the elements of an HTML and search for specific elements. The examples below show how to use this library. To learn how to crawl (or spider) websites in order to get many pages to process see this post on How to Crawl Web Pages with PHP

// Download simple_html_dom.php first from

// Get the contents of the HTML document either using cURL, a crawling
// framework, or use the provided file_get_html() function.
$html = file_get_html('');

// Find all images
foreach($html->find('img') as $element)
       echo $element->src . '<br>';

// Find all links
foreach($html->find('a') as $element)
       echo $element->href . '<br>';

// Translate whole document to plain text
echo file_get_html('')->plaintext;


Example of scraping the archive page and pulling all the post titles. In the future the page may change and this script may break. YMMV.

// This snippet will print out all of the post titles in the archive.
require_once('simple_html_dom.php'); // Get simple_html_dom.php from

$html = file_get_html('');

foreach ($html->find(".view-blog-archive a") as $archiveLink) {
  echo $archiveLink->plaintext . "\n";