当前位置: 首页 > news >正文

亦庄网站建设公司网站ico添加

亦庄网站建设公司,网站ico添加,中国关于生态文明建设的网站,企业网站建设的请示1 Beautiful说明 BeautifulSoup库是灵活又方便的网页解析库#xff0c;处理高效#xff0c;支持多种解析器。利用它不用编写正则表达式即可方便地实线网页信息的提取。 安装 pip3 install beautifulsoup4解析库 解析器使用方法优势劣势Python标准库BeautifulSoup(markup,…1 Beautiful说明 BeautifulSoup库是灵活又方便的网页解析库处理高效支持多种解析器。利用它不用编写正则表达式即可方便地实线网页信息的提取。 安装 pip3 install beautifulsoup4解析库 解析器使用方法优势劣势Python标准库BeautifulSoup(markup, “html.parser”)Python的内置标准库、执行速度适中 、文档容错能力强Python 2.7.3 or 3.2.2)前的版本中文容错能力差lxml HTML 解析器BeautifulSoup(markup, “lxml”)速度快、文档容错能力强需要安装C语言库lxml XML 解析器BeautifulSoup(markup, “xml”)速度快、唯一支持XML的解析器需要安装C语言库html5libBeautifulSoup(markup, “html5lib”)最好的容错性、以浏览器的方式解析文档、生成HTML5格式的文档速度慢、不依赖外部扩展 2 基本使用 html htmlheadtitleThe Dormouses story/title/head body p classtitle namedromousebThe Dormouses story/b/p p classstoryOnce upon a time there were three little sisters; and their names were a hrefhttp://example.com/elsie classsister idlink1!-- Elsie --/a, a hrefhttp://example.com/lacie classsister idlink2Lacie/a and a hrefhttp://example.com/tillie classsister idlink3Tillie/a; and they lived at the bottom of a well./p p classstory.../pfrom bs4 import BeautifulSoupsoup BeautifulSoup(html, lxml) # 传入解析器lxml print(soup.prettify()) # 格式化代码自动补全 print(soup.title.string) # 得到title标签里的内容报错 3 标签选择器 选择元素 html htmlheadtitleThe Dormouses story/title/head body p classtitle namedromousebThe Dormouses story/b/p p classstoryOnce upon a time there were three little sisters; and their names were a hrefhttp://example.com/elsie classsister idlink1!-- Elsie --/a, a hrefhttp://example.com/lacie classsister idlink2Lacie/a and a hrefhttp://example.com/tillie classsister idlink3Tillie/a; and they lived at the bottom of a well./p p classstory.../pfrom bs4 import BeautifulSoupsoup BeautifulSoup(html, lxml) # 传入解析器lxml print(soup.title) # 选择了title标签 print(type(soup.title)) # 查看类型 print(soup.head)获取名称 获得标签的名称 html htmlheadtitleThe Dormouses story/title/head body p classtitle namedromousebThe Dormouses story/b/p p classstoryOnce upon a time there were three little sisters; and their names were a hrefhttp://example.com/elsie classsister idlink1!-- Elsie --/a, a hrefhttp://example.com/lacie classsister idlink2Lacie/a and a hrefhttp://example.com/tillie classsister idlink3Tillie/a; and they lived at the bottom of a well./p p classstory.../pfrom bs4 import BeautifulSoupsoup BeautifulSoup(html, lxml) # 传入解析器lxml print(soup.title.name)获取属性 html htmlheadtitleThe Dormouses story/title/head body p classtitle namedromousebThe Dormouses story/b/p p classstoryOnce upon a time there were three little sisters; and their names were a hrefhttp://example.com/elsie classsister idlink1!-- Elsie --/a, a hrefhttp://example.com/lacie classsister idlink2Lacie/a and a hrefhttp://example.com/tillie classsister idlink3Tillie/a; and they lived at the bottom of a well./p p classstory.../pfrom bs4 import BeautifulSoupsoup BeautifulSoup(html, lxml) # 传入解析器lxml print(soup.p.attrs[name])#获取p标签中name这个属性的值 print(soup.p[name])#另一种写法比较直接获取内容 html htmlheadtitleThe Dormouses story/title/head body p classtitle namedromousebThe Dormouses story/b/p p classstoryOnce upon a time there were three little sisters; and their names were a hrefhttp://example.com/elsie classsister idlink1!-- Elsie --/a, a hrefhttp://example.com/lacie classsister idlink2Lacie/a and a hrefhttp://example.com/tillie classsister idlink3Tillie/a; and they lived at the bottom of a well./p p classstory.../pfrom bs4 import BeautifulSoupsoup BeautifulSoup(html, lxml) # 传入解析器lxml print(soup.p.string)嵌套选择 html htmlheadtitleThe Dormouses story/title/head body p classtitle namedromousebThe Dormouses story/b/p p classstoryOnce upon a time there were three little sisters; and their names were a hrefhttp://example.com/elsie classsister idlink1!-- Elsie --/a, a hrefhttp://example.com/lacie classsister idlink2Lacie/a and a hrefhttp://example.com/tillie classsister idlink3Tillie/a; and they lived at the bottom of a well./p p classstory.../pfrom bs4 import BeautifulSoupsoup BeautifulSoup(html, lxml) # 传入解析器lxml print(soup.head.title.string)子节点和子孙节点 contents方式 html htmlheadtitleThe Dormouses story/title/headbodyp classstoryOnce upon a time there were three little sisters; and their names werea hrefhttp://example.com/elsie classsister idlink1spanElsie/span/aa hrefhttp://example.com/lacie classsister idlink2Lacie/a anda hrefhttp://example.com/tillie classsister idlink3Tillie/aand they lived at the bottom of a well./pp classstory.../p from bs4 import BeautifulSoupsoup BeautifulSoup(html, lxml) # 传入解析器lxml print(soup.p.contents) # 获取指定标签的子节点类型是list输出结果 [\n Once upon a time there were three little sisters; and their names were\n , a classsister hrefhttp://example.com/elsie idlink1 spanElsie/span /a, \n, a classsister hrefhttp://example.com/lacie idlink2Lacie/a, \n and\n , a classsister hrefhttp://example.com/tillie idlink3Tillie/a, \n and they lived at the bottom of a well.\n ]Process finished with exit code 0 child方式 html htmlheadtitleThe Dormouses story/title/headbodyp classstoryOnce upon a time there were three little sisters; and their names werea hrefhttp://example.com/elsie classsister idlink1spanElsie/span/aa hrefhttp://example.com/lacie classsister idlink2Lacie/a anda hrefhttp://example.com/tillie classsister idlink3Tillie/aand they lived at the bottom of a well./pp classstory.../p from bs4 import BeautifulSoupsoup BeautifulSoup(html, lxml) # 传入解析器lxml print(soup.p.children)#获取指定标签的子节点的迭代器对象 for i,children in enumerate(soup.p.children):#i接受索引children接受内容print(i,children)2为空是因为标签与标签之间空一行 子孙节点 html htmlheadtitleThe Dormouses story/title/headbodyp classstoryOnce upon a time there were three little sisters; and their names werea hrefhttp://example.com/elsie classsister idlink1spanElsie/span/aa hrefhttp://example.com/lacie classsister idlink2Lacie/a anda hrefhttp://example.com/tillie classsister idlink3Tillie/aand they lived at the bottom of a well./pp classstory.../p from bs4 import BeautifulSoupsoup BeautifulSoup(html, lxml) # 传入解析器lxml print(soup.p.descendants)#获取指定标签的子孙节点的迭代器对象 for i,child in enumerate(soup.p.descendants):#i接受索引child接受内容print(i,child)父节点和祖先节点 parent html htmlheadtitleThe Dormouses story/title/headbodyp classstoryOnce upon a time there were three little sisters; and their names werea hrefhttp://example.com/elsie classsister idlink1spanElsie/span/aa hrefhttp://example.com/lacie classsister idlink2Lacie/a anda hrefhttp://example.com/tillie classsister idlink3Tillie/aand they lived at the bottom of a well./pp classstory.../p from bs4 import BeautifulSoupsoup BeautifulSoup(html, lxml) # 传入解析器lxml print(soup.a.parent)#获取指定标签的父节点打印出了a节点的父节点p标签 parents html htmlheadtitleThe Dormouses story/title/headbodyp classstoryOnce upon a time there were three little sisters; and their names werea hrefhttp://example.com/elsie classsister idlink1spanElsie/span/aa hrefhttp://example.com/lacie classsister idlink2Lacie/a anda hrefhttp://example.com/tillie classsister idlink3Tillie/aand they lived at the bottom of a well./pp classstory.../p from bs4 import BeautifulSoupsoup BeautifulSoup(html, lxml) # 传入解析器lxml print(list(enumerate(soup.a.parents)))#获取指定标签的祖先节点输出结果 [(0, p classstoryOnce upon a time there were three little sisters; and their names werea classsister hrefhttp://example.com/elsie idlink1 spanElsie/span /a a classsister hrefhttp://example.com/lacie idlink2Lacie/a anda classsister hrefhttp://example.com/tillie idlink3Tillie/aand they lived at the bottom of a well./p), (1, body p classstoryOnce upon a time there were three little sisters; and their names werea classsister hrefhttp://example.com/elsie idlink1 spanElsie/span /a a classsister hrefhttp://example.com/lacie idlink2Lacie/a anda classsister hrefhttp://example.com/tillie idlink3Tillie/aand they lived at the bottom of a well./p p classstory.../p /body), (2, html head titleThe Dormouses story/title /head body p classstoryOnce upon a time there were three little sisters; and their names werea classsister hrefhttp://example.com/elsie idlink1 spanElsie/span /a a classsister hrefhttp://example.com/lacie idlink2Lacie/a anda classsister hrefhttp://example.com/tillie idlink3Tillie/aand they lived at the bottom of a well./p p classstory.../p /body/html), (3, html head titleThe Dormouses story/title /head body p classstoryOnce upon a time there were three little sisters; and their names werea classsister hrefhttp://example.com/elsie idlink1 spanElsie/span /a a classsister hrefhttp://example.com/lacie idlink2Lacie/a anda classsister hrefhttp://example.com/tillie idlink3Tillie/aand they lived at the bottom of a well./p p classstory.../p /body/html)]Process finished with exit code 0兄弟节点 html htmlheadtitleThe Dormouses story/title/headbodyp classstoryOnce upon a time there were three little sisters; and their names werea hrefhttp://example.com/elsie classsister idlink1spanElsie/span/aa hrefhttp://example.com/lacie classsister idlink2Lacie/a anda hrefhttp://example.com/tillie classsister idlink3Tillie/aand they lived at the bottom of a well./pp classstory.../p from bs4 import BeautifulSoupsoup BeautifulSoup(html, lxml) # 传入解析器lxml print(list(enumerate(soup.a.next_siblings)))#获取指定标签的后面的兄弟节点 print(list(enumerate(soup.a.previous_siblings)))#获取指定标签的前面的兄弟节点输出结果 [(0, \n), (1, a classsister hrefhttp://example.com/lacie idlink2Lacie/a), (2, \n and\n ), (3, a classsister hrefhttp://example.com/tillie idlink3Tillie/a), (4, \n and they lived at the bottom of a well.\n )] [(0, \n Once upon a time there were three little sisters; and their names were\n )]Process finished with exit code 04 标准选择器 find_all( name , attrs , recursive , text , **kwargs ) 可根据标签名、属性、内容查找文档。 name html div classpaneldiv classpanel-headingh4Hello/h4/divdiv classpanel-bodyul classlist idlist-1li classelementFoo/lili classelementBar/lili classelementJay/li/ulul classlist list-small idlist-2li classelementFoo/lili classelementBar/li/ul/div /divfrom bs4 import BeautifulSoupsoup BeautifulSoup(html, lxml) print(soup.find_all(ul)) # 查找所有ul标签下的内容 print(type(soup.find_all(ul)[0])) # 查看其类型嵌套地查找标签下的子标签: html div classpaneldiv classpanel-headingh4Hello/h4/divdiv classpanel-bodyul classlist idlist-1li classelementFoo/lili classelementBar/lili classelementJay/li/ulul classlist list-small idlist-2li classelementFoo/lili classelementBar/li/ul/div /divfrom bs4 import BeautifulSoupsoup BeautifulSoup(html, lxml) for ul in soup.find_all(ul):print(ul.find_all(li))attrs 通过属性进行元素的查找 html div classpaneldiv classpanel-headingh4Hello/h4/divdiv classpanel-bodyul classlist idlist-1 nameelementsli classelementFoo/lili classelementBar/lili classelementJay/li/ulul classlist list-small idlist-2li classelementFoo/lili classelementBar/li/ul/div /div from bs4 import BeautifulSoupsoup BeautifulSoup(html, lxml) print(soup.find_all(attrs{id: list-1})) # 传入的是一个字典类型也就是想要查找的属性 print(soup.find_all(attrs{name: elements}))特殊类型的参数查找 from bs4 import BeautifulSoup soup BeautifulSoup(html, lxml) print(soup.find_all(idlist-1))#id是个特殊的属性可以直接使用 print(soup.find_all(class_element)) #class是关键字所以要用class_text 根据文本内容来进行选择 html div classpaneldiv classpanel-headingh4Hello/h4/divdiv classpanel-bodyul classlist idlist-1li classelementFoo/lili classelementBar/lili classelementJay/li/ulul classlist list-small idlist-2li classelementFoo/lili classelementBar/li/ul/div /divfrom bs4 import BeautifulSoup soup BeautifulSoup(html, lxml) print(soup.find_all(textFoo))#查找文本为Foo的内容但是返回的不是标签text在做内容匹配的时候比较方便但是在做内容查找的时候并不是太方便。 其他方式 find find用法和findall一模一样但是返回的是找到的第一个符合条件的内容输出。find_parents() find_parent() find_parents()返回所有祖先节点find_parent()返回直接父节点。find_next_siblings() ,find_next_sibling() 1返回后面的所有兄弟节点2返回后面的第一个兄弟节点find_previous_siblings(),find_previous_sibling() 1返回前面所有兄弟节点…find_all_next(),find_next() 1返回节点后所有符合条件的节点2返回后面第一个符合条件的节点find_all_previous()和find_previous() 同理。5 CSS选择器 通过select()直接传入CSS选择器即可完成选择 html div classpaneldiv classpanel-headingh4Hello/h4/divdiv classpanel-bodyul classlist idlist-1li classelementFoo/lili classelementBar/lili classelementJay/li/ulul classlist list-small idlist-2li classelementFoo/lili classelementBar/li/ul/div /divfrom bs4 import BeautifulSoupsoup BeautifulSoup(html, lxml) print(soup.select(.panel .panel-heading)) # .代表class中间需要空格来分隔 print(soup.select(ul li)) # 选择ul标签下面的li标签 print(soup.select(#list-2 .element)) # #代表id。这句的意思是查找id为list-2的标签下的classelement的元素 print(type(soup.select(ul)[0])) # 打印节点类型层层嵌套的选择 html div classpaneldiv classpanel-headingh4Hello/h4/divdiv classpanel-bodyul classlist idlist-1li classelementFoo/lili classelementBar/lili classelementJay/li/ulul classlist list-small idlist-2li classelementFoo/lili classelementBar/li/ul/div /divfrom bs4 import BeautifulSoup soup BeautifulSoup(html, lxml) for ul in soup.select(ul):print(ul.select(li))获取属性 html div classpaneldiv classpanel-headingh4Hello/h4/divdiv classpanel-bodyul classlist idlist-1li classelementFoo/lili classelementBar/lili classelementJay/li/ulul classlist list-small idlist-2li classelementFoo/lili classelementBar/li/ul/div /divfrom bs4 import BeautifulSoupsoup BeautifulSoup(html, lxml) for ul in soup.select(ul):print(ul[id]) # 用[ ]即可获取属性print(ul.attrs[id]) # 另一种写法获取内容 html div classpaneldiv classpanel-headingh4Hello/h4/divdiv classpanel-bodyul classlist idlist-1li classelementFoo/lili classelementBar/lili classelementJay/li/ulul classlist list-small idlist-2li classelementFoo/lili classelementBar/li/ul/div /divfrom bs4 import BeautifulSoup soup BeautifulSoup(html, lxml) for li in soup.select(li):print(li.get_text())6 总结 推荐使用lxml解析库必要时使用html.parser标签选择筛选功能弱但是速度快建议使用find()、find_all() 查询匹配单个结果或者多个结果如果对CSS选择器熟悉建议使用select()记住常用的获取属性和文本值的方法
http://www.zqtcl.cn/news/160660/

相关文章:

  • 自媒体网站 程序18款免费软件app下载推荐
  • 产业园门户网站建设方案瑞昌网络推广
  • 长春市网站建设动漫wordpress主题下载地址
  • 如何做专业的模板下载网站wordpress 多网址
  • 做qq头像的网站wordpress 安装 服务器 系统
  • 怎样查网站的注册地点百度小说排行榜2021
  • 网站建设中中文模板wordpress siren
  • 设计本官方网站电脑版附近室内装修公司电话
  • 服务外包网站wordpress 禁止转载
  • l礼品文化网站建设不常见的网络营销方式
  • 做网站侵权腾讯企点打不开
  • iis 网站拒绝显示此网页上海网站建设类岗位
  • 营销型网站建设推荐google关键词
  • 网站上线是前端还是后端来做如何做垂直门户网站
  • 网站建设与管理2018海尔集团网站 建设目的
  • ps做网站大小wordpress调用 php文件
  • php网站忘记后台密码江苏网页制作报价
  • 网站模板 哪个好完备的常州网站推广
  • 衡水淘宝的网站建设濮阳市城乡一体化示范区主任
  • 公司网上注册在哪个网站商洛市商南县城乡建设局网站
  • 怎么才能让网站图文展示大连网站建设设计
  • 俱乐部网站 模板seo产品是什么意思
  • 新手学做网站的教学书建造师查询官网
  • win2012 iis添加网站群辉做网站服务器
  • 网站优化课程培训山东网站备案公司
  • top wang域名做网站好事业单位门户网站建设评价
  • 有什么网站可以做简历网站备案表格
  • 网站开发用什么图片格式最好厦门人才网个人会员
  • 关于网站开发的文献深圳网络推广代运营
  • 网站做app的重要性做静态网站有什么用