HTML tags in blockquotes are not stripped #6

tdemin · 2021-08-12T16:37:01Z

Initially discovered in #5.

Despite (Renderer).paragraph() utilizing (mostly) the same logic as (Renderer).blockquote(), raw HTML is stripped from text paragraphs, but not from blockquotes. Appears to be a gomarkdown issue.

The text was updated successfully, but these errors were encountered:

mntn-xyz · 2021-09-12T22:27:58Z

I think the problem is that blockquotes contain nested nodes. I fixed the issue here, but I feel like there's a cleaner way to do this, so I didn't want to make a PR yet: https://github.com/mntn-xyz/gmnhg/tree/blockquote

To fix it, I rendered children recursively, and if it was an HTMLBlock or HTMLSpan then I just rendered the node as a plain leaf, replacing the text with the Markdown content. There's definitely a better way to do this, I was just messing around to see if it could be fixed...

tdemin · 2021-09-13T12:57:55Z

@mntn-xyz this looks like it could be possibly unified with the container branch of textWithNewlineReplacement (which would also define the future behavior of general text with HTML tags), although the weird behavior of *ast.HTMLSpan-s and blocks being trimmed from the general text and still being found in blockquote children AST still holds.

If anything, this is probably good: it makes up for possible future fixes landing in gomarkdown.

mntn-xyz · 2021-09-13T14:48:33Z

Makes sense to me, I'll put together a patch sometime this week.

This makes the renderer print the content of informational HTML tags while stripping the tags themselves. Tags like script, iframe, style, etc, which are unlikely to ever hold presentable content, are exempt from this, and their content is skipped from rendering as well as the tags themselves. <br>, a hard-break tag, is supported as a Markdown hard-break replacement (the two spaces before newline). This also adds tests for this behavior inside general_text.md. Fixes #6, a longstanding issue with inline HTML in blockquotes.

tdemin added bug Something isn't working gomarkdown Issue in upstream gomarkdown labels Aug 12, 2021

mntn-xyz mentioned this issue Sep 19, 2021

Strip HTML tags (but keep any text content) when rendering text #33

Merged

tdemin closed this as completed in #33 Oct 2, 2021

tdemin mentioned this issue Oct 3, 2021

Migrate to Goldmark #40

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HTML tags in blockquotes are not stripped #6

HTML tags in blockquotes are not stripped #6

tdemin commented Aug 12, 2021

mntn-xyz commented Sep 12, 2021 •

edited

tdemin commented Sep 13, 2021

mntn-xyz commented Sep 13, 2021

HTML tags in blockquotes are not stripped #6

HTML tags in blockquotes are not stripped #6

Comments

tdemin commented Aug 12, 2021

mntn-xyz commented Sep 12, 2021 • edited

tdemin commented Sep 13, 2021

mntn-xyz commented Sep 13, 2021

mntn-xyz commented Sep 12, 2021 •

edited