Replies: 4 comments 13 replies
-
A little explanation of the code. What comes from
Since JSON doesn't distinguish between homogeneous (arrays) and heterogeneous (tuples) lists, For example, the contents of an The whole code is ~300 lines long, and it should be pretty maintainable in case of minor changes in the pandoc types. |
Beta Was this translation helpful? Give feedback.
-
Some observations on the naming of tags and attributes. For XML elements, I kept the names of Pandoc AST with their capitalization, so there are For attributes I chose a "kebab" notation for names that have capital letters inside: In some cases I had to introduce elements that are not explicitly in the Pandoc AST, like I chose lowercase versions, since they are not part of the AST. I did the same with The following are capitalized instead: They are clearly debatable choices, I am still doubtful about many of them. |
Beta Was this translation helpful? Give feedback.
-
Can you give us (or link to) a sample text in this XML format? |
Beta Was this translation helpful? Give feedback.
-
Probably it should be rewritten to avoid the Aeson intermediary. In addition to code simplicity and performance, a good reason is that doing all this custom processing using string identifiers removes a lot of the type safety we'd get using the Pandoc types directly. For example, if you use the types directly you'll find out from the compiler if you've forgotten to implement something. [EDIT: just to amplify the importance of this, suppose we modify pandoc-types; it would be really easy to forget to make needed changes here unless we have the compiler tell us.] I hate to ask you to do that, though, since I already asked you to use the Aeson! (I had thought that using Aeson it would just be 20 lines of code or something, but that is without the customizations needed to make it look good.) I could probably rewrite it fairly easily. I might also be tempted to use xml-conduit instead of xml-light. (The types are fairly similar, but my guess is that xml-conduit's renderer is faster.) I'm open to persuasion to keeping the Aeson approach, though, if it's possible to do significant code-sharing between reader and writer with the Aeson approach, as you suggest. Round-trip testing is the way to go, and it's possible to do randomized round-trip tests quite easily. If you create a function As for pretty-printing: an advantage of T.P.XML is that it will allow respecting |
Beta Was this translation helpful? Give feedback.
-
This discussion is a follow up of #10556, the goal is a XML format that is 1:1 equivalent of native and JSON formats.
The code of the Writer is in the "xml" branch of my fork of Pandoc (it's my first haskell code, be merciful 🙏 ).
It's not tested enough, but what I want to show and discuss is the approach.
@jgm suggested to produce XML starting from the data coming from ToJSON.
It's what I did, but I'm not sure it's the way @jgm intended.
For me, the XML grammar should be not just a 1:1 equivalent of native format, but something like a XHTML with the tags matching the names of Pandoc AST ("Para", "Div", "Emph", etc.), so that it's readable and meaningful.
Consistently with that intent, "Str" and "Space" items are not converted into
<Str>
or<Space>
elements, but they are actual UTF-8 text and spaces.@jgm: does such a Writer -- I mean a Writer following that approach -- have any chance to enter Pandoc codebase?
Beta Was this translation helpful? Give feedback.
All reactions