Requisito previo Implementación de Web Scraping en Python con BeautifulSoup
En este artículo, vamos a escribir un script de python para obtener el estado del vuelo.
Módulo necesario:
- bs4: Beautiful Soup (bs4) es una biblioteca de Python para extraer datos de archivos HTML y XML. Este módulo no viene integrado con Python. Para instalar este tipo, escriba el siguiente comando en la terminal.
pip install bs4
- Requests: la solicitud le permite enviar requests HTTP/1.1 de forma extremadamente sencilla. Este módulo tampoco viene integrado con Python. Para instalar este tipo, escriba el siguiente comando en la terminal.
pip install requests
Acercarse:
- Módulo de importación
- Crear una función de obtención de URL
- Ahora combine la información en la URL y pase la URL a la función getdata() y convierta esos datos en código HTML.
- Ahora encuentre la etiqueta requerida del código HTML y recorra el resultado
Implementación:
Python3
# import module import requests from bs4 import BeautifulSoup # UDF for get HTML code # from URL def get_html(Airline_code, Flight_number, Date, Month, Year): def getdata(url): r = requests.get(url) return r.text # url url = "https://www.flightstats.com/v2/flight-tracker/"+Airline_code + \ "/"+Flight_number+"?year="+Year+"&month="+Month+"&date="+Date # pass the url # into getdata function htmldata = getdata(url) soup = BeautifulSoup(htmldata, 'html.parser') return(soup) # Get Flight number # from Html code def flight_no(soup): Flight_no = "" # Find div tag with # unique class name for i in soup.find("div", class_="ticket__FlightNumberContainer-s1rrbl5o-4 hgbvHg"): Flight_no = Flight_no + (i.get_text()) + " " return (Flight_no) # Get Airport name # from HTML code def airport(soup): Airport_name = [] # Find div tag with # unique class name for i in soup.find_all("div", class_="text-helper__TextHelper-s8bko4a-0 CPamx"): Airport_name.append(i.get_text()) return (Airport_name) # get status # from HTML code def status(soup, Airport_list): Time_status = [] Airport_List = [] Status_str = [] Gate = [] Gate_no = [] # Find div tag with # unique class name # to get Gate number for data in soup.find_all("div", class_="ticket__TGBLabel-s1rrbl5o-15 gcbyEH text-helper__TextHelper-s8bko4a-0 dfeqpK"): Gate.append(data.get_text()) for data in soup.find_all("div", class_="ticket__TGBValue-s1rrbl5o-16 icyRae text-helper__TextHelper-s8bko4a-0 cCfBRT"): Gate_no.append(data.get_text()) # Get status from # html code for i in soup.find_all("div", class_="text-helper__TextHelper-s8bko4a-0 bcmzUJ"): Status_str.append(i.get_text()) for i in soup.find_all("div", class_="text-helper__TextHelper-s8bko4a-0 cCfBRT"): Time_status.append(i.get_text()) # traverse the Data # from scraping data for item in range(4): if item == 0: print(Airport_list[0]) if item == 2: print("") print(Airport_list[1]) print(Status_str[item] + " : " + Time_status[item]) print(Gate[item] + " : " + Gate_no[item]) for item in range(len(Gate)): print(Gate[item] + " : " + Gate_no[item]) # Driver code if __name__ == '__main__': # Input Data from geek Airline_code = 'G8' Flight_number = '134' Date = '23' Month = '10' Year = '2020' # Calling the get_html # with argument # function calling soup = get_html(Airline_code, Flight_number, Date, Month, Year) print("Flight number : ", flight_no(soup)) Airport_list = airport(soup) status(soup, Airport_list)
Producción:
Flight number : G8 134 GoAir Jay Prakash Narayan International Airport Scheduled : 21:00 IST Terminal : N/A Estimated : 21:00 IST Gate : N/A Indira Gandhi International Airport Scheduled : 22:40 IST Terminal : T2 Estimated : 22:40 IST Gate : 205 Terminal : N/A Gate : N/A Terminal : T2 Gate : 205
Publicación traducida automáticamente
Artículo escrito por kumar_satyam y traducido por Barcelona Geeks. The original can be accessed here. Licence: CCBY-SA