Los documentos de Word contienen texto con formato envuelto en tres niveles de objeto. Nivel más bajo: objetos de ejecución, nivel medio: objetos de párrafo y nivel más alto: objeto de documento.
Por lo tanto, no podemos trabajar con estos documentos usando editores de texto normales. Pero podemos manipular estos documentos de Word en python usando el módulo python-docx.
1. El primer paso es instalar este módulo de terceros python-docx. Puede usar pip «pip install python-docx» o descargar el tarball desde aquí . Aquí está el repositorio de Github.
2. Después de la instalación, importe «docx» NO «python-docx».
3. Use la clase «docx.Document» para comenzar a trabajar con el documento de Word.
Código #1:
# import docx NOT python-docx import docx # create an instance of a word document doc = docx.Document() # add a heading of level 0 (largest heading) doc.add_heading('Heading for the document', 0) # add a paragraph and store # the object in a variable doc_para = doc.add_paragraph('Your paragraph goes here, ') # add a run i.e, style like # bold, italic, underline, etc. doc_para.add_run('hey there, bold here').bold = True doc_para.add_run(', and ') doc_para.add_run('these words are italic').italic = True # add a page break to start a new page doc.add_page_break() # add a heading of level 2 doc.add_heading('Heading level 2', 2) # pictures can also be added to our word document # width is optional doc.add_picture('path_to_picture') # now save the document to a location doc.save('path_to_document')
Producción:
Notice the page break in the second page.
Code #2: Now, to open a word document, create an instance along with passing the path to the document.
# import the Document class # from the docx module from docx import Document # create an instance of a # word document we want to open doc = Document('path_to_the_document') # print the list of paragraphs in the document print('List of paragraph objects:->>>') print(doc.paragraphs) # print the list of the runs # in a specified paragraph print('\nList of runs objects in 1st paragraph:->>>') print(doc.paragraphs[0].runs) # print the text in a paragraph print('\nText in the 1st paragraph:->>>') print(doc.paragraphs[0].text) # for printing the complete document print('\nThe whole content of the document:->>>\n') for para in doc.paragraphs: print(para.text)
Producción:
List of paragraph objects:->>> [<docx.text.paragraph.Paragraph object at 0x7f45b22dc128>, <docx.text.paragraph.Paragraph object at 0x7f45b22dc5c0>, <docx.text.paragraph.Paragraph object at 0x7f45b22dc0b8>, <docx.text.paragraph.Paragraph object at 0x7f45b22dc198>, <docx.text.paragraph.Paragraph object at 0x7f45b22dc0f0>] List of runs objects in 1st paragraph:->>> [<docx.text.run.Run object at 0x7f45b22dc198>] Text in the 1st paragraph:->>> Heading for the document The whole content of the document:->>> Heading for the document Your paragraph goes here, hey there, bold here, and these words are italic Heading level 2
Referencia: https://python-docx.readthedocs.io/en/latest/#user-guide .
Publicación traducida automáticamente
Artículo escrito por mohit_negi y traducido por Barcelona Geeks. The original can be accessed here. Licence: CCBY-SA