视频1 视频21 视频41 视频61 视频文章1 视频文章21 视频文章41 视频文章61 推荐1 推荐3 推荐5 推荐7 推荐9 推荐11 推荐13 推荐15 推荐17 推荐19 推荐21 推荐23 推荐25 推荐27 推荐29 推荐31 推荐33 推荐35 推荐37 推荐39 推荐41 推荐43 推荐45 推荐47 推荐49 关键词1 关键词101 关键词201 关键词301 关键词401 关键词501 关键词601 关键词701 关键词801 关键词901 关键词1001 关键词1101 关键词1201 关键词1301 关键词1401 关键词1501 关键词1601 关键词1701 关键词1801 关键词1901 视频扩展1 视频扩展6 视频扩展11 视频扩展16 文章1 文章201 文章401 文章601 文章801 文章1001 资讯1 资讯501 资讯1001 资讯1501 标签1 标签501 标签1001 关键词1 关键词501 关键词1001 关键词1501 专题2001
python用BeautifulSoup抓取div标签的实例教程
2020-11-27 14:24:10 责编:小采
文档


这篇文章主要介绍了python 3利用BeautifulSoup抓取p标签的方法,文中给出了详细的示例代码供大家参考学习,对大家具有一定的参考学习价值,需要的朋友们下面来一起看看吧。

前言

本文主要介绍的是关于python 3用BeautifulSoup抓取p标签的方法示例,分享出来供大家参考学习,下面来看看详细的介绍:

示例代码:

# -*- coding:utf-8 -*-
#python 2.7
#XiaoDeng
#http://tieba.baidu.com/p/2460150866
#标签操作


from bs4 import BeautifulSoup
import urllib.request
import re


#如果是网址,可以用这个办法来读取网页
#html_doc = "http://tieba.baidu.com/p/2460150866"
#req = urllib.request.Request(html_doc) 
#webpage = urllib.request.urlopen(req) 
#html = webpage.read()


html="""
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" rel="external nofollow" class="sister" id="xiaodeng"><!-- Elsie --></a>,
<a href="http://example.com/lacie" rel="external nofollow" rel="external nofollow" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" rel="external nofollow" class="sister" id="link3">Tillie</a>;
<a href="http://example.com/lacie" rel="external nofollow" rel="external nofollow" class="sister" id="xiaodeng">Lacie</a>
and they lived at the bottom of a well.</p>
<p class="ntopbar_loading"><img src="http://simg.sinajs.cn/blog7style/images/common/loading.gif">加载中…</p>

<p class="SG_connHead">
 <span class="title" comp_title="个人资料">个人资料</span>
 <span class="edit">
 </span>
<p class="info_list"> 
 <ul class="info_list1">
 <li><span class="SG_txtc">博客等级:</span><span id="comp_901_grade"><img src="http://simg.sinajs.cn/blog7style/images/common/sg_trans.gif" real_src="http://simg.sinajs.cn/blog7style/images/common/number/9.gif" /></span></li>
 <li><span class="SG_txtc">博客积分:</span><span id="comp_901_score"><strong>0</strong></span></li>
 </ul>
 <ul class="info_list2">
 <li><span class="SG_txtc">博客访问:</span><span id="comp_901_pv"><strong>3,971</strong></span></li>
 <li><span class="SG_txtc">关注人气:</span><span id="comp_901_attention"><strong>0</strong></span></li>
 <li><span class="SG_txtc">获赠金笔:</span><strong id="comp_901_d_goldpen">0支</strong></li>
 <li><span class="SG_txtc">赠出金笔:</span><strong id="comp_901_r_goldpen">0支</strong></li>
 <li class="lisp" id="comp_901_badge"><span class="SG_txtc">荣誉徽章:</span></li>
 </ul>
 </p>
<p class="atcTit_more"><span class="SG_more"><a href="http://blog.sina.com.cn/" rel="external nofollow" rel="external nofollow" target="_blank">更多>></a></span></p> 
<p class="story">...</p>
"""
soup = BeautifulSoup(html, 'html.parser') #文档对象



# 类名为xxx而且文本内容为hahaha的p
for k in soup.find_all('p',class_='atcTit_more'):#,string='更多'
 print(k)
 #<p class="atcTit_more"><span class="SG_more"><a href="http://blog.sina.com.cn/" rel="external nofollow" rel="external nofollow" target="_blank">更多>></a></span></p>

【相关推荐】

1. python利用beautifulSoup实现爬虫

2. 利用Python实现异步代理爬虫及代理池方法

3. 详解Python爬虫使用代理proxy抓取网页方法

下载本文
显示全文
专题