How to set up an HTTP Proxy in PHP

The proliferation of XML standards such as RSS have made it easy to grab data from external sources and repost it on your own site.

Generally websites that output their data into well-formed XML are happy to have you redistribute their content for “personal, non-commercial use.”  The code is fairly simple:

1
2
3
4
5
6
7
$dom = new DOMDocument();
$dom->load('http://www.google.com/trends/hottrends/atom/hourly');
$trends = $dom->getElementsByTagName('entry');
foreach ($trends as $trend){
$title = $trend->getElementsByTagName('title')->item(0)->nodeValue;
$url = $trend->getElementsByTagName('source_url')->item(0)->nodeValue;
if($url){echo "<a href="$url">$title</a>";}

In the above example, we access the famous Google buzz feed.  Once we’ve loaded it into a DOM variable (which just makes it easier to crawl the variable), we grab all the entry tags (in many feeds, you’ll want to grab the item tags instead), and then read the titles and links.  To keep us from outputting a title without a link, I’ve encapsulated the output in an if statement.

The final result can be read on that page, or output on another page using an include or require function, like this:

1
require('somefile.php');

Keep in mind, though, that the require or include method won’t work in many CMS frameworks (such as WordPress posts).  That’s because the remote php code could screw up the rest of the code on the page.

But never fear, Ajax is here!  In another post, I tell you how to take the above code and deploy it in your posts or WordPress pages.

PHP 5 also has a method for scraping data from an HTML document that isn’t well structured.  Just replace the load function above with:

1
$dom->loadHTMLFile("http://somefile.htm");

Though, it should be said: if you’re using this function, you’re asking for trouble.

  • Print
  • PDF
  • email
  • Facebook
  • Twitter
  • Digg
  • Google Bookmarks
This entry was posted in Technology and tagged , , , . Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

One Trackback

  1. [...] would you do that? Turns out, it’s ridiculously easy. I won’t go into the http proxy you have to build to grab aggregate data from a remote server (another lesson, another time), but [...]

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>