Recomendación de películas basada en la emoción en Python

Introducción Uno de los objetivos subyacentes de las películas es evocar emociones en sus espectadores. IMDb ofrece todas las películas para todos los géneros. Por lo tanto, los títulos de las películas se pueden raspar de la lista de IMDb para recomendar al usuario. IMDb no tiene una API para acceder a información sobre películas y series de televisión. Por lo tanto, tenemos que realizar el raspado. El raspado se utiliza para acceder a la información de un sitio web que generalmente se realiza con API.
Instalación

Instale BeautifulSoup y lxml ,
abra la terminal y escriba

pip install beautifulsoup4
pip install lxml

El raspador está escrito en Python y usa lxml para analizar las páginas web. BeautifulSoup se utiliza para extraer datos de archivos HTML y XML.

Emoción asociada con el género de la película

Hay 8 clases de emoción que serían efectivas para clasificar un texto. Estos son: ‘Ira’, ‘Anticipación’, ‘Disgusto’, ‘Miedo’, ‘Alegría’, ‘Tristeza’, ‘Sorpresa’, ‘Confianza’ . Aquí estos se toman como entrada y las películas correspondientes se mostrarían para la emoción.
La correspondencia de cada emoción con el género de las películas se enumera a continuación:

Triste – Drama
Disgusto – Musical
Ira –
Anticipación familiar – Thriller
Miedo –
Disfrute del deporte – Thriller
Confianza – Western
Sorpresa – Cine negro

Según la emoción de entrada, se seleccionaría el género correspondiente y se recomendarían al usuario las 5 mejores películas de ese género.

# Python3 code for movie
# recommendation based on
# emotion
  
# Import library for web
# scrapping
from bs4 import BeautifulSoup as SOUP
import re
import requests as HTTP
  
# Main Function for scraping
def main(emotion):
  
    # IMDb Url for Drama genre of
    # movie against emotion Sad
    if(emotion == "Sad"):
        urlhere = 'http://www.imdb.com/search/title?genres=drama&title_type=feature&sort=moviemeter, asc'
  
    # IMDb Url for Musical genre of
    # movie against emotion Disgust
    elif(emotion == "Disgust"):
        urlhere = 'http://www.imdb.com/search/title?genres=musical&title_type=feature&sort=moviemeter, asc'
  
    # IMDb Url for Family genre of
    # movie against emotion Anger
    elif(emotion == "Anger"):
        urlhere = 'http://www.imdb.com/search/title?genres=family&title_type=feature&sort=moviemeter, asc'
  
    # IMDb Url for Thriller genre of
    # movie against emotion Anticipation
    elif(emotion == "Anticipation"):
        urlhere = 'http://www.imdb.com/search/title?genres=thriller&title_type=feature&sort=moviemeter, asc'
  
    # IMDb Url for Sport genre of
    # movie against emotion Fear
    elif(emotion == "Fear"):
        urlhere = 'http://www.imdb.com/search/title?genres=sport&title_type=feature&sort=moviemeter, asc'
  
    # IMDb Url for Thriller genre of
    # movie against emotion Enjoyment
    elif(emotion == "Enjoyment"):
        urlhere = 'http://www.imdb.com/search/title?genres=thriller&title_type=feature&sort=moviemeter, asc'
  
    # IMDb Url for Western genre of
    # movie against emotion Trust
    elif(emotion == "Trust"):
        urlhere = 'http://www.imdb.com/search/title?genres=western&title_type=feature&sort=moviemeter, asc'
  
    # IMDb Url for Film_noir genre of
    # movie against emotion Surprise
    elif(emotion == "Surprise"):
        urlhere = 'http://www.imdb.com/search/title?genres=film_noir&title_type=feature&sort=moviemeter, asc'
  
    # HTTP request to get the data of
    # the whole page
    response = HTTP.get(urlhere)
    data = response.text
  
    # Parsing the data using
    # BeautifulSoup
    soup = SOUP(data, "lxml")
  
    # Extract movie titles from the
    # data using regex
    title = soup.find_all("a", attrs = {"href" : re.compile(r'\/title\/tt+\d*\/')})
    return title
  
# Driver Function
if __name__ == '__main__':
  
    emotion = input("Enter the emotion: ")
    a = main(emotion)
    count = 0
  
    if(emotion == "Disgust" or emotion == "Anger"
                           or emotion=="Surprise"):
  
        for i in a:
  
            # Splitting each line of the
            # IMDb data to scrape movies
            tmp = str(i).split('>;')
  
            if(len(tmp) == 3):
                print(tmp[1][:-3])
  
            if(count > 13):
                break
            count += 1
    else:
        for i in a:
            tmp = str(i).split('>')
  
            if(len(tmp) == 3):
                print(tmp[1][:-3])
  
            if(count > 11):
                break
            count+=1

Este script rasparía todos los títulos de películas del género correspondiente a la emoción de entrada y la lista para el usuario.

Web Scraping es muy beneficioso para extraer los datos y analizarlos. Sin web scraping, Internet, como la conoces, realmente no existiría. Esto se debe a que Google y otros motores de búsqueda importantes confían en un raspador web sofisticado para extraer el contenido que se incluirá en su índice. Estas herramientas son las que hacen posibles los motores de búsqueda.

Aplicaciones de rastreo

Extracción de artículos para sitios web que seleccionan contenido.
Extracción de listados comerciales para empresas que construyen bases de datos de clientes potenciales.
Muchos tipos diferentes de extracción de datos, a veces llamados minería de datos. Por ejemplo, un uso popular y, a veces, controvertido de un raspador web es extraer precios de las aerolíneas para publicarlos en sitios de comparación de tarifas aéreas.

Publicación traducida automáticamente

Artículo escrito por Ayush Govil 1 y traducido por Barcelona Geeks. The original can be accessed here. Licence: CCBY-SA

Deja una respuesta Cancelar la respuesta