PNL | Transformación de fragmentos de árbol a texto y enstringmiento de fragmentos

Podemos volver a convertir un árbol o subárbol en una oración o string de fragmentos. Para entender cómo hacerlo, el siguiente código usa el primer árbol del corpus treebank_chunk.

Código #1: Unir las palabras en árbol con espacio.

# Loading library    
from nltk.corpus import treebank_chunk
  
# tree
tree = treebank_chunk.chunked_sents()[0]
  
print ("Tree : \n", tree)
  
print ("\nTree leaves : \n", tree.leaves())
  
print ("\nSentence from tree : \n", ' '.join(
        [w for w, t in tree.leaves()]))

Producción :

Tree : 
 (S
  (NP Pierre/NNP Vinken/NNP), /,
  (NP 61/CD years/NNS)
  old/JJ, /,
  will/MD
  join/VB
  (NP the/DT board/NN)
  as/IN
  (NP a/DT nonexecutive/JJ director/NN Nov./NNP 29/CD)
  ./.)

Tree leaves : 
 [('Pierre', 'NNP'), ('Vinken', 'NNP'), (', ', ', '), ('61', 'CD'), 
 ('years', 'NNS'), ('old', 'JJ'), (', ', ', '), ('will', 'MD'), ('join', 'VB'),
 ('the', 'DT'), ('board', 'NN'), ('as', 'IN'), ('a', 'DT'), ('nonexecutive', 'JJ'),
 ('director', 'NN'), ('Nov.', 'NNP'), ('29', 'CD'), ('.', '.')]

Sentence from tree : 
 Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29 .

Como en el código anterior, las puntuaciones no son correctas porque el punto y las comas se tratan como palabras especiales. Entonces, también obtienen los espacios circundantes. Pero en el siguiente código, podemos arreglar esto usando la sustitución de expresiones regulares.

Código #2: chunk_tree_to_sent()función para mejorar el Código 1

import re
  
# defining regex expression
punct_re = re.compile(r'\s([, \.;\?])')
  
def chunk_tree_to_sent(tree, concat =' '):
  
    s = concat.join([w for w, t in tree.leaves()])
    return re.sub(punct_re, r'\g<1>', s)

Código #3: Evaluación de chunk_tree_to_sent()

# Loading library    
from nltk.corpus import treebank_chunk
from transforms import chunk_tree_to_sent
  
# tree
tree = treebank_chunk.chunked_sents()[0]
  
print ("Tree : \n", tree)
  
print ("\nTree leaves : \n", tree.leaves())
  
print ("Tree to sentence : ", chunk_tree_to_sent(tree))

Producción :

Tree : 
 (S
  (NP Pierre/NNP Vinken/NNP), /,
  (NP 61/CD years/NNS)
  old/JJ, /,
  will/MD
  join/VB
  (NP the/DT board/NN)
  as/IN
  (NP a/DT nonexecutive/JJ director/NN Nov./NNP 29/CD)
  ./.)

Tree leaves : 
 [('Pierre', 'NNP'), ('Vinken', 'NNP'), (', ', ', '), ('61', 'CD'), 
 ('years', 'NNS'), ('old', 'JJ'), (', ', ', '), ('will', 'MD'), ('join', 'VB'),
 ('the', 'DT'), ('board', 'NN'), ('as', 'IN'), ('a', 'DT'), ('nonexecutive', 'JJ'),
 ('director', 'NN'), ('Nov.', 'NNP'), ('29', 'CD'), ('.', '.')]

Tree to sentence : 
Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29.

Transformación
de fragmentos enstringdos Las funciones de transformación se pueden enstringr para normalizar fragmentos y los fragmentos resultantes suelen ser más cortos y aún tienen el mismo significado.

En el código a continuación, se pasa a la función un solo fragmento y una lista opcional de funciones de transformación. Esta función llamará a cada función de transformación en el fragmento y devolverá el fragmento final.

Código #4:

def transform_chunk(
        chunk, chain = [filter_insignificant, 
                        swap_verb_phrase, swap_infinitive_phrase, 
                        singularize_plural_noun], trace = 0):
    for f in chain:
        chunk = f(chunk)
          
        if trace:
            print (f.__name__, ':', chunk)
              
    return chunk

Código #5: Evaluación de transform_chunk

from transforms import transform_chunk
  
chunk = [('the', 'DT'), ('book', 'NN'), ('of', 'IN'), 
         ('recipes', 'NNS'), ('is', 'VBZ'), ('delicious', 'JJ')]
  
print ("Chunk : \n", chunk)
  
print ("\nTransformed Chunk : \n", transform_chunk(chunk))

Producción :

Chunk :  
[('the', 'DT'), ('book', 'NN'), ('of', 'IN'), ('recipes', 'NNS'), 
('is', 'VBZ'), ('delicious', 'JJ')]

Transformed Chunk : 
[('delicious', 'JJ'), ('recipe', 'NN'), ('book', 'NN')]

Publicación traducida automáticamente

Artículo escrito por Mohit Gupta_OMG 🙂 y traducido por Barcelona Geeks. The original can be accessed here. Licence: CCBY-SA

Deja una respuesta Cancelar la respuesta