/ etherpad / ui / node_modules / ep_etherpad-lite / plugin_packages / saxes /

[ICO]NameLast modifiedSizeDescription
[PARENTDIR]Parent Directory  -  
[DIR]node_modules/4 days ago -  
[   ]package.json39 years ago2.8K 
[TXT]README.md39 years ago 15K 
[TXT]saxes.d.ts39 years ago 20K 
[   ]saxes.js.map39 years ago 51K 
[   ]saxes.js39 years ago 72K 
README.md

saxes

A sax-style non-validating parser for XML.

Saxes is a fork of sax 1.2.4. All mentions of sax in this project's documentation are references to sax 1.2.4.

Designed with node in mind, but should work fine in the browser or other CommonJS implementations.

Saxes does not support Node versions older than 10.

Notable Differences from Sax.

Conformance

Saxes supports:

Limitations

This is a non-validating parser so it only verifies whether the document is well-formed. We do aim to raise errors for all malformed constructs encountered. However, this parser does not thorougly parse the contents of DTDs. So most malformedness errors caused by errors in DTDs cannot be reported.

Regarding <!DOCTYPE and <!ENTITY

The parser will handle the basic XML entities in text nodes and attribute values: &amp;amp; &amp;lt; &amp;gt; &amp;apos; &amp;quot;. It's possible to define additional entities in XML by putting them in the DTD. This parser doesn't do anything with that. If you want to listen to the doctype event, and then fetch the doctypes, and read the entities and add them to parser.ENTITIES, then be my guest.

Documentation

The source code contains JSDOC comments. Use them. What follows is a brief summary of what is available. The final authority is the source code.

PAY CLOSE ATTENTION TO WHAT IS PUBLIC AND WHAT IS PRIVATE.

The move to TypeScript makes it so that everything is now formally private, protected, or public.

If you use anything not public, that's at your own peril.

If there's a mistake in the documentation, raise an issue. If you just assume, you may assume incorrectly.

Summary Usage Information

Example

var saxes = require("./lib/saxes"),
  parser = new saxes.SaxesParser();

parser.on("error", function (e) {
  // an error happened.
});
parser.on("text", function (t) {
  // got some text.  t is the string of text.
});
parser.on("opentag", function (node) {
  // opened a tag.  node has "name" and "attributes"
});
parser.on("end", function () {
  // parser stream is done, and ready to have more stuff written to it.
});

parser.write('<xml>Hello, <who name="world">world</who>!</xml>').close();

Constructor Arguments

Settings supported:

Methods

write - Write bytes onto the stream. You don't have to pass the whole document in one write call. You can read your source chunk by chunk and call write with each chunk.

close - Close the stream. Once closed, no more data may be written until it is done processing the buffer, which is signaled by the end event.

Properties

The parser has the following properties:

line, column, columnIndex, position - Indications of the position in the XML document where the parser currently is looking. The columnIndex property counts columns as if indexing into a JavaScript string, whereas the column property counts Unicode characters.

closed - Boolean indicating whether or not the parser can be written to. If it's true, then wait for the ready event to write again.

opt - Any options passed into the constructor.

xmlDecl - The XML declaration for this document. It contains the fields version, encoding and standalone. They are all undefined before encountering the XML declaration. If they are undefined after the XML declaration, the corresponding value was not set by the declaration. There is no event associated with the XML declaration. In a well-formed document, the XML declaration may be preceded only by an optional BOM. So by the time any event generated by the parser happens, the declaration has been processed if present at all. Otherwise, you have a malformed document, and as stated above, you cannot rely on the parser data!

Error Handling

The parser continues to parse even upon encountering errors, and does its best to continue reporting errors. You should heed all errors reported. After an error, however, saxes may interpret your document incorrectly. For instance <foo a=bc="d"/> is invalid XML. Did you mean to have <foo a="bc=d"/> or <foo a="b" c="d"/> or some other variation? For the sake of continuing to provide errors, saxes will continue parsing the document, but the structure it reports may be incorrect. It is only after the errors are fixed in the document that saxes can provide a reliable interpretation of the document.

That leaves you with two rules of thumb when using saxes:

Events

To listen to an event, override on<eventname>. The list of supported events are also in the exported EVENTS array.

See the JSDOC comments in the source code for a description of each supported event.

Parsing XML Fragments

The XML specification does not define any method by which to parse XML fragments. However, there are usage scenarios in which it is desirable to parse fragments. In order to allow this, saxes provides three initialization options.

If you pass the option fragment: true to the parser constructor, the parser will expect an XML fragment. It essentially starts with a parsing state equivalent to the one it would be in if parser.write("<foo">) had been called right after initialization. In other words, it expects content which is acceptable inside an element. This also turns off well-formedness checks that are inappropriate when parsing a fragment.

The option additionalNamespaces allows you to define additional prefix-to-URI bindings known before parsing starts. You would use this over resolvePrefix if you have at the ready a series of namespaces bindings to use.

The option resolvePrefix allows you to pass a function which saxes will use if it is unable to resolve a namespace prefix by itself. You would use this over additionalNamespaces in a context where getting a complete list of defined namespaces is onerous.

Note that you can use additionalNamespaces and resolvePrefix together if you want. additionalNamespaces applies before resolvePrefix.

The options additionalNamespaces and resolvePrefix are really meant to be used for parsing fragments. However, saxes won't prevent you from using them with fragment: false. Note that if you do this, your document may parse without errors and yet be malformed because the document can refer to namespaces which are not defined in the document.

Of course, additionalNamespaces and resolvePrefix are used only if xmlns is true. If you are parsing a fragment that does not use namespaces, there's no point in setting these options.

Performance Tips

FAQ

Q. Why has saxes dropped support for limiting the size of data chunks passed to event handlers?

A. With sax you could set MAX_BUFFER_LENGTH to cause the parser to limit the size of data chunks passed to event handlers. So if you ran into a span of text above the limit, multiple text events with smaller data chunks were fired instead of a single event with a large chunk.

However, that functionality had some problematic characteristics. It had an arbitrary default value. It was library-wide so all parsers created from a single instance of the sax library shared it. This could potentially cause conflicts among libraries running in the same VM but using sax for different purposes.

These issues could have been easily fixed, but there were larger issues. The buffer limit arbitrarily applied to some events but not others. It would split text, cdata and script events. However, if a comment, doctype, attribute or processing instruction were more than the limit, the parser would generate an error and you were left picking up the pieces.

It was not intuitive to use. You'd think setting the limit to 1K would prevent chunks bigger than 1K to be passed to event handlers. But that was not the case. A comment in the source code told you that you might go over the limit if you passed large chunks to write. So if you want a 1K limit, don't pass 64K chunks to write. Fair enough. You know what limit you want so you can control the size of the data you pass to write. So you limit the chunks to write to 1K at a time. Even if you do this, your event handlers may get data chunks that are 2K in size. Suppose on the previous write the parser has just finished processing an open tag, so it is ready for text. Your write passes 1K of text. You are not above the limit yet, so no event is generated yet. The next write passes another 1K of text. It so happens that sax checks buffer limits only once per write, after the chunk of data has been processed. Now you've hit the limit and you get a text event with 2K of data. So even if you limit your write calls to the buffer limit you've set, you may still get events with chunks at twice the buffer size limit you've specified.

We may consider reinstating an equivalent functionality, provided that it addresses the issues above and does not cause a huge performance drop for use-case scenarios that don't need it.

Apache/2.4.38 (Debian) Server at www.karls.computer Port 80