織夢(mèng)采集,一般用不到采集網(wǎng)址有端口的情況,少數(shù)有端口的網(wǎng)址就無(wú)法采集了??偨Y(jié)了下dede無(wú)法采集端口不為80的網(wǎng)址錯(cuò)誤解決:
問(wèn)題描述,當(dāng)采集的網(wǎng)址后代端口時(shí)(為防止有推廣嫌疑就把網(wǎng)址換成xxx了。):
測(cè)試采集網(wǎng)址:http://www.xxx.com:89/index.php/main/news/index.html?channel_id=104&page=1
獲取的列表測(cè)試信息網(wǎng)址是不帶端口的結(jié)果是不帶端口的數(shù)組集合:
測(cè)試的列表網(wǎng)址: http://www.xxx.com:89/index.php/main/news/index.html?channel_id=104&page=1
Array ( [0] => Array ( [title] => 講座回放|施奠東—西湖,世界風(fēng)景園林的 [link] => http://www.xxx.com/index.php/main/news/15529.html [image] => http://www.xxx.com/uploadfiles/articles/20190528/15529.png ) [1] => Array ( [title] => 喜報(bào)|恭賀我院2019年度西湖杯榮獲佳績(jī)! [link] => http://www.xxx.com/index.php/main/news/15528.html [image] => http://www.xxx.com/uploadfiles/articles/20190522/15528.jpg ) [2] => Array ( [title] => 講座預(yù)告|西湖——世界風(fēng)景園林的杰出范 [link] => http://www.xxx.com/index.php/main/news/15526.html [image] => http://www.xxx.com/uploadfiles/articles/20190516/15526.jpg ) [3] => Array ( [title] => 講座回放|胡理琛—西湖七十年流變憶勝 [link] => http://www.xxx.com/index.php/main/news/15524.html [image] => http://www.xxx.com/uploadfiles/articles/20190513/15524.png ) [4] => Array ( [title] => 講座回放|彭嘉恒—“南師、禪及其在西方 [link] => http://www.xxx.com/index.php/main/news/15518.html [image] => http://www.xxx.com/uploadfiles/articles/20190507/15518.png ) [5] => Array ( [title] => 講座預(yù)告|胡理琛—西湖七十年流變憶勝 [link] => http://www.xxx.com/index.php/main/news/15516.html [image] => http://www.xxx.com/uploadfiles/articles/20190430/15516.jpg ) ) |
這樣顯然得到的網(wǎng)址是錯(cuò)誤的。根本無(wú)法訪問(wèn),也就無(wú)法采集了。
經(jīng)過(guò)一番查找,原來(lái)是dede 設(shè)置HTML的內(nèi)容和來(lái)源網(wǎng)址 的函數(shù)問(wèn)題,漏寫(xiě)端口判斷了。
在include/dedehtml2.class.php
function SetSource 函數(shù)里大概79行加上紅框里的內(nèi)容:
再測(cè)試一下。ok 了,這樣網(wǎng)址就可以正常打開(kāi),采集到了。
付上代碼:
function SetSource(&$html, $url = '', $linktype='') { $this->__construct(); $this->CAtt = new DedeAttribute2(); $url = trim($url); $this->SourceHtml = $html; $this->BaseUrl = $url; //判斷文檔相對(duì)于當(dāng)前的路徑 $urls = @parse_url($url); $port=$urls['port']=='80'?'':':'.$urls['port'];//lyy 為80時(shí)候可以省略,否則就加上 $this->HomeUrl = $urls['host'].$port; $this->BaseUrlPath = $this->HomeUrl.$urls['path']; $this->BaseUrlPath = preg_replace("/\/([^\/]*)\.(.*)$/","/",$this->BaseUrlPath); $this->BaseUrlPath = preg_replace("/\/$/",'',$this->BaseUrlPath); if($linktype!='') { $this->GetLinkType = $linktype; } if($html != '') { $this->Analyser(); } } |
版權(quán)聲明: 本站資源均來(lái)自互聯(lián)網(wǎng)或會(huì)員發(fā)布,如果侵犯了您的權(quán)益請(qǐng)與我們聯(lián)系,我們將在24小時(shí)內(nèi)刪除!謝謝!
轉(zhuǎn)載請(qǐng)注明: 織夢(mèng)自帶采集無(wú)法采集端口不為80的網(wǎng)址錯(cuò)誤解決方法