The proliferation of XML standards such as RSS have made it easy to grab data from external sources and repost it on your own site.
Generally websites that output their data into well-formed XML are happy to have you redistribute their content for “personal, non-commercial use.” The code is fairly simple:
1 2 3 4 5 6 7 | $dom = new DOMDocument(); $dom->load('http://www.google.com/trends/hottrends/atom/hourly'); $trends = $dom->getElementsByTagName('entry'); foreach ($trends as $trend){ $title = $trend->getElementsByTagName('title')->item(0)->nodeValue; $url = $trend->getElementsByTagName('source_url')->item(0)->nodeValue; if($url){echo "<a href="$url">$title</a>";} |
In the above example, we access the famous Google buzz feed. Once we’ve loaded it into a DOM variable (which just makes it easier to crawl the variable), we grab all the entry tags (in many feeds, you’ll want to grab the item tags instead), and then read the titles and links. To keep us from outputting a title without a link, I’ve encapsulated the output in an if statement.
The final result can be read on that page, or output on another page using an include or require function, like this:
1 | require('somefile.php'); |
Keep in mind, though, that the require or include method won’t work in many CMS frameworks (such as WordPress posts). That’s because the remote php code could screw up the rest of the code on the page.
But never fear, Ajax is here! In another post, I tell you how to take the above code and deploy it in your posts or WordPress pages.
PHP 5 also has a method for scraping data from an HTML document that isn’t well structured. Just replace the load function above with:
1 | $dom->loadHTMLFile("http://somefile.htm"); |
Though, it should be said: if you’re using this function, you’re asking for trouble.
Economic Analysis
Economic Calendar
Real Clear Markets
Undervalued stocks
Value Cruncher
Wikinvest
One Trackback
[...] would you do that? Turns out, it’s ridiculously easy. I won’t go into the http proxy you have to build to grab aggregate data from a remote server (another lesson, another time), but [...]