¿Cómo raspar todo el texto de la etiqueta del cuerpo usando Beautifulsoup en Python?

El generador de strings lo proporciona Beautiful Soup, que es un marco de web scraping para Python. El raspado web es el proceso de extracción de datos del sitio web utilizando herramientas automatizadas para acelerar el proceso. Una desventaja del atributo de string es que solo funciona para etiquetas con string dentro y no devuelve nada para etiquetas con más etiquetas dentro. Por lo tanto, para resolver este problema, se utiliza un generador de strings para obtener todas las strings dentro de una etiqueta, de forma recursiva.

Sintaxis:

tag.strings

Los siguientes ejemplos explican el concepto de strings en Beautiful Soup.
Ejemplo 1: En este ejemplo, vamos a obtener las strings.

Python3

# Import Beautiful Soup
from bs4 import BeautifulSoup
 
# Create the document
doc = "<body><b> Hello world </b><h1> New heading </h1><body>"
 
# Initialize the object with the document
soup = BeautifulSoup(doc, "html.parser")
 
# Get the whole body tag
tag = soup.body
 
# Print each string recursively
for string in tag.strings:
    print(string)

Producción:

 Hello world 
 New heading

Ejemplo 2:

Python3

import requests
from bs4 import BeautifulSoup
 
# url of the website
doc = "https://www.geeksforgeeks.org"
 
# getting response object
res = requests.get(doc)
 
# Initialize the object with the document
soup = BeautifulSoup(res.content, "html.parser")
 
# Get the whole body tag
tag = soup.body
 
# Print each string recursively
for string in tag.strings:
    print(string)

Producción:

Publicación traducida automáticamente

Artículo escrito por gurrrung y traducido por Barcelona Geeks. The original can be accessed here. Licence: CCBY-SA

Python3

Python3

Deja una respuesta Cancelar la respuesta