找到你要的答案

Q:PHP Simple HTML Dom not parsing certain links

Q:PHP简单的HTML DOM解析某些环节不

I am in the process of understanding HTML DOM Parsers and how it works. I have a roadblock where I am unable to parse the following link but I am able to parse the root domain and other websites. Could someone help me understand why I am unable to parse this particular link?

<?php

include('simple_html_dom.php');

$base = 'http://www.stupidstudios.com/samsung-galaxy-s6/p/bbuynow';

$curl = curl_init();
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($curl, CURLOPT_HEADER, false);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_URL, $base);
curl_setopt($curl, CURLOPT_REFERER, $base);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
$str = curl_exec($curl);
curl_close($curl);

$html_base = new simple_html_dom();
$html_base->load($str);

foreach($html_base->find('h1') as $element) {
    echo "<pre>";
    print_r( $element );
   echo "</pre>";
}

$html_base->clear(); 
unset($html_base);

?>

我在了解HTML DOM解析器的过程以及它是如何工作的。我有一个难题,我无法解析下面的链接,但我能解析根域名和其他网站。有谁能帮我理解为什么我无法解析这个特定的链接?

<?php

include('simple_html_dom.php');

$base = 'http://www.stupidstudios.com/samsung-galaxy-s6/p/bbuynow';

$curl = curl_init();
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($curl, CURLOPT_HEADER, false);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_URL, $base);
curl_setopt($curl, CURLOPT_REFERER, $base);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
$str = curl_exec($curl);
curl_close($curl);

$html_base = new simple_html_dom();
$html_base->load($str);

foreach($html_base->find('h1') as $element) {
    echo "<pre>";
    print_r( $element );
   echo "</pre>";
}

$html_base->clear(); 
unset($html_base);

?>
answer1: 回答1:

It seems to work when you add/spoof it with a browser agent:

$base = 'http://www.flipkart.com/samsung-galaxy-s6/p/itme5z4aypvtrxmy';

$curl = curl_init($base);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
$str = curl_exec($curl);
curl_close($curl);

echo $str;

它的工作是当你添加/恶搞用浏览器代理:

$base = 'http://www.flipkart.com/samsung-galaxy-s6/p/itme5z4aypvtrxmy';

$curl = curl_init($base);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
$str = curl_exec($curl);
curl_close($curl);

echo $str;
php  html  curl  simple-html-dom