Skip to content Skip to sidebar Skip to footer

Why Is Lxml Closing This "ol" Tag When Parsing?

Here is some HTML:
    • item
and some python 3 code with lxml to parse it and re-print it: import sys from lxml import et

Solution 1:

I think neither HTML 4 nor HTML5 allows an ul element as a child of an ol element. Only li elements can be direct children.

That might be why an HTML parser builds a tree structure not representing the nesting you have in your input markup. Whether a "traditional" HTML 4 parser, like probably implemented in lxml's/libxml's HTML parser algorithm, did the same change to the structure is something I don't remember and I am not sure where to test it.

While two HTML5 validators flag your ul as a not-allowed child of ol, current browsers seem to preserve that nesting.

Post a Comment for "Why Is Lxml Closing This "ol" Tag When Parsing?"