Ignoring Alt Text when convert from docx to txt #364

caphefalumi · 2024-05-03T01:57:15Z

Currently, when I convert from docx to txt, the alt text of images is retrieved along with the paragraphs as something like "[ALT TEXT]", how do I exclude alt text?
Here is my code
pypandoc.convert_file(docx_path, 'plain', extra_args=['--wrap=none'], outputfile='output.txt')

JessicaTegner · 2024-05-03T04:03:03Z

From the pandoc user guide:

A link immediately preceded by a ! will be treated as an image. The link text will be used as the image’s alt text:
![la lune](lalune.jpg "Voyage to the moon")

![movie reel]

[movie reel]: movie.gif
Extension: implicit_figures
An image with nonempty alt text, occurring by itself in a paragraph, will be rendered as a figure with a caption. The image’s alt text will be used as the caption.
![This is the caption](/url/of/image.png)
[...]
If you just want a regular inline image, just make sure it is not the only thing in the paragraph. One way to do this is to insert a nonbreaking space after the image:
![This image won't be a figure](/url/of/image.png)\

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ignoring Alt Text when convert from docx to txt #364

Ignoring Alt Text when convert from docx to txt #364

caphefalumi commented May 3, 2024

JessicaTegner commented May 3, 2024

Ignoring Alt Text when convert from docx to txt #364

Ignoring Alt Text when convert from docx to txt #364

Comments

caphefalumi commented May 3, 2024

JessicaTegner commented May 3, 2024