Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use markdown image title text when generating list of figures #7915

Open
LunkRat opened this issue Feb 12, 2022 · 23 comments · May be fixed by #8617
Open

Use markdown image title text when generating list of figures #7915

LunkRat opened this issue Feb 12, 2022 · 23 comments · May be fixed by #8617

Comments

@LunkRat
Copy link
Contributor

LunkRat commented Feb 12, 2022

Currently if pandoc is called with --lof or configured with a format.yml containing:

---
lof: yes

when converting a markdown source, each figure is listed in the LoF using its markdown alt text. This is a problem as others have pointed out in a separate issue because the alt text is also used for image captions, and the text length desired for image captions is very often longer than a short identifying text needed in a list of figures.

Fortunately markdown offers a title text attribute for images which affords a practical and natural solution. The proposal here is to use the markdown image title when generating the list of figures. Usage would look like this:

[My image **caption** can be long with _formatting_.](figures/my_figure.jpg "Figure title for LoF entry")

The title attribute is simple text only (no formatting or math, etc.) which makes it perfect to use in an LoF entry where you don't want that anyway.

This solution allows for a semantic separation between image caption (markdown alt text) and LoF entry text (markdown title text).

@jgm
Copy link
Owner

jgm commented Feb 13, 2022

I guess my main question is whether it's too much of a limitation if this is confined to being plain string content? (no formatting or math, unless the math is plain text unicode) If not, then this does seem a nice solution, until we get fancier figure support worked out.

@LunkRat
Copy link
Contributor Author

LunkRat commented Feb 13, 2022

My understanding is that a list of figures in a document should contain only plain text strings to identify each figure. In my opinion being restricted to plain text is a virtue in this case.

@jgm
Copy link
Owner

jgm commented Feb 13, 2022

I can imagine wanting short figure titles like:

  • Alliances in the Iliad
  • Frequency of copain vs pote
  • Graph of $y=x^2$

@LunkRat
Copy link
Contributor Author

LunkRat commented Feb 13, 2022

@jgm You make a fair point. Does the need for italics or math in LoF titles outweigh the need for the ability to have figure captions/alt text be independent of LoF titles which this issue is attempting to solve? I argue that formatting LoF titles is a small price to pay in exchange for the ability to write figure captions that are multiple sentences without having them essentially break the LoF by rendering it unrecognizable.

One solution could be to have a conditional that uses the plain text image title attribute if present, otherwise fall back to using the alt text. I don't like this from a usability perspective, but it would allow those folks who want to use formatting in LoF titles to continue to do so using the alt text (would also make this change more backwards compatible).

Thoughts?

@tarleb
Copy link
Collaborator

tarleb commented Feb 13, 2022

Not a full solution, but for the time being you can use the "short-captions" Lua filter at https://github.com/pandoc/lua-filters/tree/master/short-captions. It only works when going from Markdown to LaTeX and has a few other limitations, but it allows for math in the short caption.

See also #3177, which may have to be completed first.

@LunkRat
Copy link
Contributor Author

LunkRat commented Feb 13, 2022

I tried the Lua filter and it works; however, when I combine it with pandoc-fignos from the pandoc-xnos package, the syntax for the Lua short-captions breaks the @fig:[name] in-text figure reference:

[My long alt text figure caption](figures/psychophysics_stimuli.png){#fig:stimuli width=75% short-caption="Psychophysics stimuli"}

The above results in an error from pandoc-fignos when I reference the figure by name with @fig:stimuli:

pandoc-fignos: Bad reference: @fig:stimuli

So while the Lua filter does give the desired behavior, it breaks other desired functionality.

@tarleb
Copy link
Collaborator

tarleb commented Feb 14, 2022

Make sure that pandoc-xnos runs before the Lua filter. Filters are run in the order in which they appear on the command line.

I wrote an updated, shorter filter that uses new pandoc features and might give better results in some cases:

if FORMAT ~= "latex" then return end

function Para (para)
  if #para.content ~= 1 then return end
  local img = para.content[1]
  if not img or img.t ~= 'Image' or #img.caption == 0
     or img.title:sub(1,4) ~= 'fig:'
     or not img.attributes['short-caption'] then
    return nil
  end

  local short_caption = pandoc.write(
    pandoc.read(img.attributes['short-caption']), FORMAT
  ):gsub('^%s*', ''):gsub('%s*$', '')  -- trim, removing surrounding whitespace

  local figure = pandoc.write(pandoc.Pandoc{para}, FORMAT)
  return pandoc.RawBlock(
    'latex',
    figure:gsub('\n\\caption', '\n\\caption[' .. short_caption .. ']')
  )
end

@LunkRat
Copy link
Contributor Author

LunkRat commented Feb 14, 2022

Thank you for the suggestion @tarleb I mistakenly thought I had tried both orders but I tried again and I got your Lua filters to work with pandoc-xnos by ordering my command so that pandoc-xnos filter runs before the Lua filter. I am using the newer, shorter Lua filter you posted on this issue and it works beautifully. So this does indeed solve my immediate need.

I still think it is worth implementing the original issue idea into Pandoc, for two reasons:

  1. It would not require any extra filter to be called.
  2. It would not require extra syntax above standard markdown. Using the Lua filter approach, documents will get littered with short-caption="[...]" which has no meaning or use outside of the specific Lua filter.

So I'm happy to be all fixed up but I still argue that this issue should be implemented as specified. Thank you @tarleb and @jgm for your attention and help!

@tarleb
Copy link
Collaborator

tarleb commented Feb 14, 2022

Thanks for the feedback, happy to hear that it works. I think we all agree that this should be implemented and become a part of pandoc. It will be easy once support for figures has been improved, and I hope to do that soon.

@jgm
Copy link
Owner

jgm commented Feb 15, 2022

The reason I'm hesitant to implement this suggestion now is that the plain string limitation seems like a problem.
(And it wouldn't be right to parse it as markdown in the writer, because we don't know that the source was markdown.)

@LunkRat
Copy link
Contributor Author

LunkRat commented Mar 17, 2022

@tarleb is there a comparable technique available for something like short-caption that could work for Table captions? I'm hoping to clean up my LoT but I see the statement about lack of support for table captions in the Limitations section of the short-caption lua filter README. If you know of any workarounds for this problem please let me know.

@tarleb
Copy link
Collaborator

tarleb commented Mar 24, 2022

I'm not aware of anything. You could try with commonmark_x instead of the classic Markdown parser and use the attributes extension.

@LunkRat
Copy link
Contributor Author

LunkRat commented Sep 27, 2022

I'm using https://github.com/pandoc/lua-filters/tree/master/short-captions for images, works great. However, I still don't have a solution for markdown tables.

I am able to get a short caption for LoT if I use a raw latex table with this syntax:

\caption[My short LoT caption]{My longer caption which appears in the body table caption but not in the LoT.}

Would be great to find a solution for markdown tables, even if it is a workaround/hack and ugly.

@jpcirrus
Copy link
Contributor

jpcirrus commented Feb 4, 2023

@LunkRat have you had a look at the table-short-captions Lua filter? I've not used it so not sure if it will do what you're looking for.

@jpcirrus
Copy link
Contributor

jpcirrus commented Feb 4, 2023

@tarleb until pandoc 3.0+ I have been successfully using your figure short caption filter (thank you), but since upgrading, short captions are ignored. I assume this is due to the support for "complex figures" made in pandoc 3.0. If so, is it still possible to use figure short captions to LaTeX output by mererly amending your filter?

@tarleb
Copy link
Collaborator

tarleb commented Feb 5, 2023 via email

@jpcirrus
Copy link
Contributor

jpcirrus commented Feb 5, 2023

Thank you @tarleb . Appreciated.

@jpcirrus
Copy link
Contributor

I have just upgraded to pandoc 3.1 and tried compiling to latex using this updated filter but the short caption is still not being inserted in the \caption command, so is obviously neglected in the list of figures. When going to the json format I can see short-caption in the output but don't know enough to work out what the issue could be.

@jpcirrus
Copy link
Contributor

jpcirrus commented Feb 12, 2023

After reading the Lua filters manual and many thanks to @wlupton's logging module I have got the filter working after amending it to:

PANDOC_VERSION:must_be_at_least '3.1'

if FORMAT:match 'latex' then
  function Figure(f)
    local short = f.content[1].content[1].attributes['short-caption']
    if short and not f.caption.short then
      f.caption.short = pandoc.Inlines(short)
    end
    return f
  end
end

tarleb added a commit to tarleb/pandoc that referenced this issue Feb 13, 2023
The title of an implicit figure, if set, is used as the short caption of
a figure. The short caption of a figure replaces the full caption in the
list of figures.

Closes: jgm#7915
@prakaa
Copy link

prakaa commented Apr 27, 2023

After reading the Lua filters manual and many thanks to @wlupton's logging module I have got the filter working after amending it to:

PANDOC_VERSION:must_be_at_least '3.1'

if FORMAT:match 'latex' then
  function Figure(f)
    local short = f.content[1].content[1].attributes['short-caption']
    if short and not f.caption.short then
      f.caption.short = pandoc.Inlines(short)
    end
    return f
  end
end

Thanks @jpcirrus and @tarleb , I updated the short-captions filter myself but then came across this issue. This is much more succinct, thanks for sharing!

Just confirming that this code as a filter, as well as table-short-captions, means that with pandoc 3.1+ I can use short captions in the list of figures and list of tables

I might reference this issue in a few repos where others may be looking for a similar fix

@jpcirrus
Copy link
Contributor

@prakaa I can confirm that the above code used as a filter ouputs figure short captions, but have no requirement for table short captions so don't know about that. Why don't you give it a go and let us know.

@prakaa
Copy link

prakaa commented Apr 29, 2023

@jpcirrus clarifying what I meant above:

  • Using the code you provided (let's call it figure-short-captions.lua), I can get short captions for figures in the list of figures (by using the flag --lua-filter=/path/to/figure-short-captions.lua)
  • Using the separate Lua filter table-short-captions.lua, I can get short captions for tables in the list of tables (by using the flag --lua-filter=/path/to/table-short-captions.lua)
    • Simply attempting to replicate the Lua code that works for figures does not work for tables (don't know much about how pandoc parses tables). However, the linked filter for tables still works for pandoc 3.1.2 (and presumably 3.0+) , though it requires a particular syntax to work (see README in linked repo)

@leowill01
Copy link

leowill01 commented Dec 3, 2023

After reading the Lua filters manual and many thanks to @wlupton's logging module I have got the filter working after amending it to:

PANDOC_VERSION:must_be_at_least '3.1'

if FORMAT:match 'latex' then
  function Figure(f)
    local short = f.content[1].content[1].attributes['short-caption']
    if short and not f.caption.short then
      f.caption.short = pandoc.Inlines(short)
    end
    return f
  end
end

you just saved my ability to render my dissertation revisions. MANY THANKS

EDIT: this ended up not being able to render markdown or latex expressions in the short captions in the LOF, so after some tinkering with gpt, here is a modified version that supports those as well:

PANDOC_VERSION:must_be_at_least '3.1'

if FORMAT:match 'latex' then
  function Figure(f)
    local short = f.content[1].content[1].attributes['short-caption']
    if short and not f.caption.short then
      -- Parse the short caption as Markdown to handle formatting and then convert to LaTeX
      local short_caption = pandoc.read(short, 'markdown').blocks[1].content
      f.caption.short = pandoc.Inlines(short_caption)
    end  
    return f
  end  
end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants