Web Scraping en Flutter – Barcelona Geeks

El proceso de extracción de datos/información requeridos de una página web accediendo al HTML de la página web se llama Web Scraping o Web Harvesting o Web Data Extraction .

Este artículo analiza los pasos involucrados en Web Scraping mediante el uso de paquetes html y http de Flutter.

Paso 1: Configure una nueva aplicación Flutter

Cree una nueva aplicación flutter ejecutando el comando:

flutter create YOUR_APP_NAME

Abra la aplicación en VS Code o Android Studio. Estoy usando el código VS.
Abra el archivo lib/main.dart y borre todo el código predeterminado
Agregue el código para sus widgets deseados. Tendré una barra de aplicaciones , una columna que contiene tres widgets de texto , un indicador de progreso circular y un widget de botón de material .

Dart

import 'package:flutter/material.dart';
  
void main() => runApp(MaterialApp(
    theme: ThemeData(
      accentColor: Colors.green,
      scaffoldBackgroundColor: Colors.green[100],
      primaryColor: Colors.green,
    ),
    home: MyApp()));
  
class MyApp extends StatefulWidget {
  const MyApp({Key key}) : super(key: key);
  
  @override
  _MyAppState createState() => _MyAppState();
}
  
class _MyAppState extends State<MyApp> {
    
  // Strings to store the extracted Article titles
  String result1 = 'Result 1';
  String result2 = 'Result 2';
  String result3 = 'Result 3';
    
  // boolean to show CircularProgressIndication
  // while Web Scraping awaits
  bool isLoading = false;
  
  @override
  Widget build(BuildContext context) {
    return Scaffold(
      appBar: AppBar(title: Text('GeeksForGeeks')),
      body: Padding(
        padding: const EdgeInsets.all(16.0),
        child: Center(
            child: Column(
          mainAxisAlignment: MainAxisAlignment.center,
          children: [
              
            // if isLoading is true show loader
            // else show Column of Texts
            isLoading
                ? CircularProgressIndicator()
                : Column(
                    children: [
                      Text(result1,
                          style: TextStyle(
                              fontSize: 20, fontWeight: FontWeight.bold)),
                      SizedBox(
                        height: MediaQuery.of(context).size.height * 0.05,
                      ),
                      Text(result2,
                          style: TextStyle(
                              fontSize: 20, fontWeight: FontWeight.bold)),
                      SizedBox(
                        height: MediaQuery.of(context).size.height * 0.05,
                      ),
                      Text(result3,
                          style: TextStyle(
                              fontSize: 20, fontWeight: FontWeight.bold)),
                    ],
                  ),
            SizedBox(height: MediaQuery.of(context).size.height * 0.08),
            MaterialButton(
              onPressed: () {},
              child: Text(
                'Scrap Data',
                style: TextStyle(color: Colors.white),
              ),
              color: Colors.green,
            )
          ],
        )),
      ),
    );
  }
}

Producción :

Paso 2: agregue los paquetes HTML y HTTP.

Abra el archivo pubspec.yaml y bajo las dependencias: agregue dos líneas http: ^0.12.0+4 y html: ^0.14.0+3 con la sangría adecuada y guarde el archivo.

Luego en tu terminal ejecuta el comando:

 flutter pub get

Abra el archivo main.dart e importe los paquetes agregando estas líneas en la parte superior:

import 'package:html/parser.dart' as parser;
import 'package:http/http.dart' as http;

Paso 3: agregar la funcionalidad Web Scraping

La página web a través de la cual demostraré Web Scraping es https://www.geeksforgeeks.org/ , extraeremos el título de los primeros tres artículos de la lista de artículos como se muestra en la imagen a continuación.

Ahora, para extraer un dato en particular, primero debemos decidir sobre una clase principalcon un nombre de clase único del resto del documento y la jerarquía de sus hijos, para esto necesitamos ver el documento HTML de la página. Podemos hacerlo abriendo el sitio web en el navegador Chrome y luego haciendo clic derecho en el texto requerido y haciendo clic en Inspeccionar .

En la imagen de arriba, puede ver que he seleccionado una clase principal con el nombre de clase = «lista de artículos» , porque tiene un nombre diferente al de todas las demás clases en el documento. Ahora, si observamos la clase Children que queremos extraer, podemos ver que para el título del primer artículo necesitamos este tipo de jerarquía :

clase “artículos-lista” >> niños[0] >> niños[0] >> niños[0]

Asimismo, para los títulos segundo y tercero, sería:

clase “artículos-lista” >> niños[1] >> niños[0] >> niños[0]

clase “artículos-lista” >> niños[2] >> niños[0] >> niños[0]

Ahora que tenemos el Nombre de clase y la Jerarquía , podemos seguir adelante y escribir la función que hace Web Scraping:

Future<List<String>> extractData() async {
//Getting the response from the targeted url
    final response =
        await http.Client().get(Uri.parse('https://www.geeksforgeeks.org/'));
        //Status Code 200 means response has been received successfully
    if (response.statusCode == 200) {
    //Getting the html document from the response
      var document = parser.parse(response.body);
      try {
      //Scraping the first article title
        var responseString1 = document
            .getElementsByClassName('articles-list')[0]
            .children[0]
            .children[0]
            .children[0];

        print(responseString1.text.trim());
        
      //Scraping the second article title
        var responseString2 = document
            .getElementsByClassName('articles-list')[0]
            .children[1]
            .children[0]
            .children[0];

        print(responseString2.text.trim());
        
      //Scraping the third article title
        var responseString3 = document
            .getElementsByClassName('articles-list')[0]
            .children[2]
            .children[0]
            .children[0];

        print(responseString3.text.trim());
     //Converting the extracted titles into string and returning a list of Strings
        return [
          responseString1.text.trim(),
          responseString2.text.trim(),
          responseString3.text.trim()
        ];
      } catch (e) {
        return ['', '', 'ERROR!'];
      }
    } else {
      return ['', '', 'ERROR: ${response.statusCode}.'];
    }
  }

Ahora llamaremos a esta función en el parámetro onPressed: de MaterialButton y mostraremos CircularProgressIndicator hasta que obtenga el resultado.

onPressed: () async {
              //Setting isLoading true to show the loader
                setState(() {
                  isLoading = true;
                });
                
                //Awaiting for web scraping function to return list of strings
                final response = await extractData();
                
                //Setting the received strings to be displayed and making isLoading false to hide the loader
                setState(() {
                  result1 = response[0];
                  result2 = response[1];
                  result3 = response[2];
                  isLoading = false;
                });
              }

Ahora, después de todo esto, nuestro main.dart se parece a esto:

Dart

import 'package:flutter/material.dart';
import 'package:html/parser.dart' as parser;
import 'package:http/http.dart' as http;
  
void main() => runApp(MaterialApp(
    theme: ThemeData(
      accentColor: Colors.green,
      scaffoldBackgroundColor: Colors.green[100],
      primaryColor: Colors.green,
    ),
    home: MyApp()));
  
class MyApp extends StatefulWidget {
  const MyApp({Key key}) : super(key: key);
  
  @override
  _MyAppState createState() => _MyAppState();
}
  
class _MyAppState extends State<MyApp> {
    
  // Strings to store the extracted Article titles
  String result1 = 'Result 1';
  String result2 = 'Result 2';
  String result3 = 'Result 3';
    
  // boolean to show CircularProgressIndication
  // while Web Scraping awaits
  bool isLoading = false;
  
  Future<List<String>> extractData() async {
      
    // Getting the response from the targeted url
    final response =
        await http.Client().get(Uri.parse('https://www.geeksforgeeks.org/'));
      
        // Status Code 200 means response has been received successfully
    if (response.statusCode == 200) {
        
    // Getting the html document from the response
      var document = parser.parse(response.body);
      try {
          
      // Scraping the first article title
        var responseString1 = document
            .getElementsByClassName('articles-list')[0]
            .children[0]
            .children[0]
            .children[0];
  
        print(responseString1.text.trim());
          
      // Scraping the second article title
        var responseString2 = document
            .getElementsByClassName('articles-list')[0]
            .children[1]
            .children[0]
            .children[0];
  
        print(responseString2.text.trim());
          
      // Scraping the third article title
        var responseString3 = document
            .getElementsByClassName('articles-list')[0]
            .children[2]
            .children[0]
            .children[0];
  
        print(responseString3.text.trim());
          
        // Converting the extracted titles into
        // string and returning a list of Strings
        return [
          responseString1.text.trim(),
          responseString2.text.trim(),
          responseString3.text.trim()
        ];
      } catch (e) {
        return ['', '', 'ERROR!'];
      }
    } else {
      return ['', '', 'ERROR: ${response.statusCode}.'];
    }
  }
  
  @override
  Widget build(BuildContext context) {
    return Scaffold(
      appBar: AppBar(title: Text('GeeksForGeeks')),
      body: Padding(
        padding: const EdgeInsets.all(16.0),
        child: Center(
            child: Column(
          mainAxisAlignment: MainAxisAlignment.center,
          children: [
              
            // if isLoading is true show loader
            // else show Column of Texts
            isLoading
                ? CircularProgressIndicator()
                : Column(
                    children: [
                      Text(result1,
                          style: TextStyle(
                              fontSize: 20, fontWeight: FontWeight.bold)),
                      SizedBox(
                        height: MediaQuery.of(context).size.height * 0.05,
                      ),
                      Text(result2,
                          style: TextStyle(
                              fontSize: 20, fontWeight: FontWeight.bold)),
                      SizedBox(
                        height: MediaQuery.of(context).size.height * 0.05,
                      ),
                      Text(result3,
                          style: TextStyle(
                              fontSize: 20, fontWeight: FontWeight.bold)),
                    ],
                  ),
            SizedBox(height: MediaQuery.of(context).size.height * 0.08),
            MaterialButton(
             onPressed: () async {
                 
              // Setting isLoading true to show the loader
                setState(() {
                  isLoading = true;
                });
                  
                // Awaiting for web scraping function
                // to return list of strings
                final response = await extractData();
                  
                // Setting the received strings to be
                // displayed and making isLoading false
                // to hide the loader
                setState(() {
                  result1 = response[0];
                  result2 = response[1];
                  result3 = response[2];
                  isLoading = false;
                });
              },
              child: Text(
                'Scrap Data',
                style: TextStyle(color: Colors.white),
              ),
              color: Colors.green,
            )
          ],
        )),
      ),
    );
  }
}

Producción:

Publicación traducida automáticamente

Artículo escrito por curiousyuvi y traducido por Barcelona Geeks. The original can be accessed here. Licence: CCBY-SA

Dart

Dart

Deja una respuesta Cancelar la respuesta