Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignoring Alt Text when convert from docx to txt #364

Open
caphefalumi opened this issue May 3, 2024 · 1 comment
Open

Ignoring Alt Text when convert from docx to txt #364

caphefalumi opened this issue May 3, 2024 · 1 comment

Comments

@caphefalumi
Copy link

Currently, when I convert from docx to txt, the alt text of images is retrieved along with the paragraphs as something like "[ALT TEXT]", how do I exclude alt text?
Here is my code
pypandoc.convert_file(docx_path, 'plain', extra_args=['--wrap=none'], outputfile='output.txt')

@JessicaTegner
Copy link
Owner

From the pandoc user guide:

A link immediately preceded by a ! will be treated as an image. The link text will be used as the image’s alt text:
![la lune](lalune.jpg "Voyage to the moon")

![movie reel]

[movie reel]: movie.gif
Extension: implicit_figures
An image with nonempty alt text, occurring by itself in a paragraph, will be rendered as a figure with a caption. The image’s alt text will be used as the caption.
![This is the caption](/url/of/image.png)
[...]
If you just want a regular inline image, just make sure it is not the only thing in the paragraph. One way to do this is to insert a nonbreaking space after the image:
![This image won't be a figure](/url/of/image.png)\

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants