Extrayendo un valor de atributo con beautifulsoup en Python

Requisito previo: Instalación de Beautifulsoup

Los atributos son proporcionados por Beautiful Soup, que es un marco de web scraping para Python. El raspado web es el proceso de extracción de datos del sitio web utilizando herramientas automatizadas para acelerar el proceso. Una etiqueta puede tener cualquier número de atributos. Por ejemplo, la etiqueta <b class=”active”> tiene un atributo “class” cuyo valor es “active”. Podemos acceder a los atributos de una etiqueta tratándola como un diccionario.

Sintaxis:

tag.attrs

Implementación:
Ejemplo 1: Programa para extraer los atributos utilizando el enfoque attrs.

Python3

# Import Beautiful Soup
from bs4 import BeautifulSoup
  
# Initialize the object with a HTML page
soup = BeautifulSoup('''
    <html>
        <h2 class="hello"> Heading 1 </h2>
        <h1> Heading 2 </h1>
    </html>
    ''', "lxml")
  
# Get the whole h2 tag
tag = soup.h2
  
# Get the attribute
attribute = tag.attrs
  
# Print the output
print(attribute)

Producción:

{'class': ['hello']}

Ejemplo 2: Programa para extraer los atributos usando un enfoque de diccionario.

Python3

# Import Beautiful Soup
from bs4 import BeautifulSoup
  
# Initialize the object with a HTML page
soup = BeautifulSoup('''
    <html>
        <h2 class="hello"> Heading 1 </h2>
        <h1> Heading 2 </h1>
    </html>
    ''', "lxml")
  
# Get the whole h2 tag
tag = soup.h2
  
# Get the attribute
attribute = tag['class']
  
# Print the output
print(attribute)

Producción:

['hello']

Ejemplo 3: programa para extraer los valores de atributos múltiples utilizando el enfoque de diccionario.

Python3

# Import Beautiful Soup
from bs4 import BeautifulSoup
  
# Initialize the object with a HTML page
soup = BeautifulSoup('''
    <html>
        <h2 class="first second third"> Heading 1 </h2>
        <h1> Heading 2 </h1>
    </html>
    ''', "lxml")
  
# Get the whole h2 tag
tag = soup.h2
  
# Get the attribute
attribute = tag['class']
  
# Print the output
print(attribute)

Producción:

['first', 'second', 'third']

Publicación traducida automáticamente

Artículo escrito por gurrrung y traducido por Barcelona Geeks. The original can be accessed here. Licence: CCBY-SA

Python3

Python3

Python3

Deja una respuesta Cancelar la respuesta