¿Cómo raspar etiquetas anidadas usando BeautifulSoup?

Podemos desechar la etiqueta Nested en una sopa hermosa con la ayuda de. (punto) operador. Después de crear una sopa de la página, si queremos navegar por la etiqueta anidada, entonces con la ayuda de. podemos hacerlo Para raspar la etiqueta anidada usando Beautifulsoup, siga los pasos mencionados a continuación.

Enfoque paso a paso

Paso 1: El primer paso será para raspar, necesitamos importar el módulo beautifulsoup y obtener la solicitud del sitio web, necesitamos importar el módulo de requests.

from bs4 import BeautifulSoup
import requests

Paso 2: El segundo paso será solicitar el método de obtención de la llamada URL.

page=requests.get(sample_website)

Paso 3: El tercer paso será para crear sopa, usar el método beautifulsoup y para el árbol de análisis HTML, usar un analizador HTML.

BeautifulSoup(page.content, 'html.parser')

Paso 4: El cuarto paso será ejecutar .operator hasta que queramos la etiqueta para desechar la etiqueta anidada, si queremos desechar la etiqueta dentro del cuerpo y la tabla, usaremos la siguiente declaración para eliminar las etiquetas anidadas.

soup.body.table.tag

Implementaciones

A continuación se muestran varios ejemplos que muestran cómo eliminar diferentes etiquetas anidadas de una URL en particular.

Ejemplo 1:

Python3

from bs4 import BeautifulSoup
import requests
 
# sample website
sample_website = 'https://www.geeksforgeeks.org/different-ways-to-remove-all-the-digits-from-string-in-java/'
 
# call get method to request the page
page = requests.get(sample_website)
 
# with the help of BeautifulSoup method and
# html parser created soup
soup = BeautifulSoup(page.content, 'html.parser')
 
# With the help of . operator we will scrap a tag
# under body->ui->i
# here we will go a tag inside body then ul then
# i.means under the body tag we will go to ul tag
# and again inside the ul tag we will go i tag
print(soup.body.ul.i)

Producción:

<i class="gfg-icon gfg-icon_arrow-down gfg-icon_header"></i>

Ejemplo 2:

Python3

from bs4 import BeautifulSoup
import requests
 
# sample website
sample_website = 'https://www.geeksforgeeks.org/different-ways-to-remove-all-the-digits-from-string-in-java/'
 
# call get method to request the page
page = requests.get(sample_website)
 
# with the help of BeautifulSoup method and html
# parser created soup
soup = BeautifulSoup(page.content, 'html.parser')
 
# With the help of . operator we will scrap a tag
# under body->a
# here we will go a tag inside body then a then
# li.means under the body tag we will go to a tag
print(soup.body.a)

Producción:

<a class="gfg-stc" href="#main" style="top:0">Skip to content</a>

Ejemplo 3:

Python3

from bs4 import BeautifulSoup
import requests
 
# sample website
sample_website = 'https://www.geeksforgeeks.org/different-ways-to-remove-all-the-digits-from-string-in-java/'
 
# call get method to request the page
page = requests.get(sample_website)
 
# with the help of BeautifulSoup method and
# html parser created soup
soup = BeautifulSoup(page.content, 'html.parser')
 
#With the help of . operator we will scrap a
# tag under body->a
# here we will go a tag inside body then a then
# li.means under the body tag we will go to a tag 
print(soup.body.a)
 
# With the help of . operator we will scrap a
# tag under body->ui->li
# here we will go a tag inside body then ul then
# li.means under the body tag we will go to ul tag
# and again inside the ul tag we will go li tag
# and inside to li tag we will go to a tag
print(soup.body.ul.li.a)

Producción:

<a href=”https://www.geeksforgeeks.org/analysis-of-algorithms-set-1-asymptotic-analysis/” target=”_self”>Análisis asintótico</a>

Publicación traducida automáticamente

Artículo escrito por vipinyadav15799 y traducido por Barcelona Geeks. The original can be accessed here. Licence: CCBY-SA

Enfoque paso a paso

Implementaciones

Python3

Python3

Python3

Deja una respuesta Cancelar la respuesta