Friday, August 6, 2010

2:19 AM
This page combines the two by downloading all the images from a specified web page and creating thumbnails for them.

The code

Note that the code downloads the images directly using file_get_contents and therefore needs URL aware fopen wrappers enabled to work. 

require_once('/path/to/simple_html_dom.php');
$html = file_get_html('http://www.cnn.com/');
$images = array();
foreach($html->find('img') as $element) {
    $images[$element->src] = true;
}
foreach($images as $url => $void) {
    $tg->generate($url, 100, 100, '/path/to/thumbnails/' . md5($url) . '.jpg');
}

The example downloads the www.cnn.com homepage and extracts all the images using the HTML DOM Parse, whose syntax works the same way as jQuery.
The images that are found are put into an array indexed by the full url; this effectively eliminates duplicates (which in the case of the CNN homepage at the current time includes a 1 pixel spacer image used many times).
This array is then looped through and the thumbnail images generated. I've named them using an md5 hash based on the full url with a .jpg extension/format in the example. This solves issues with pathing etc in the full URL filename.
The above example will create thumbnails that are a maximum 100x100 pixels.

CSS images won't be included

Note that the above example only gets images from the page which are defined with an <img> tag; any defined inline using CSS backgrounds etc or in a style sheet will not be downloaded.

Refinements to the script

The script could be refined to exclude images that are below a certain size (e.g images which are less than 100 pixels wide or 100 pixels high could be ignored) or of a particular format. You could do the latter with my thumbnail generation class by setting the allowable types. I'll have a look at these (and any suggestions made in the comments below) and post an update in a few days.

0 comments: