- Fixed a bug in detection of images in
plain_textmethod. (#141) - Improved typehints.
- Fixed a bug in detecting HTML tags nested in wiki markup. (#140)
- Improved type hints.
- Fixed a bug in
external_linksproperty where|was recognized as part of the link by mistake. (#139)
- Fixed a bug in
get_sectionswhentop_levels_onlywasTrue.
- Drop Python 3.7 support.
- Fixed a bug in detecting the text of an external link. (#137)
- Fixed a bug in
Section.levelresulting in malformed section titles when multiple levels are added (#135)
- Performance improvements in extracting bold and italic nodes. (#133)
- Performance improvements in
__setitem__/__delitem__andpformat/plain_textmethods. (#131)
- Fixed a bug in
plain_textcausingIndexErrorwhen using a custom function to replacetemplates/parser_functions.
- Fixed a bug in
plain_textnot detecting images with multiple dots correctly. (#129)
- Fixed: Equal signs in extension tag attributes are no longer confused with name-value separator in arguments. (#128)
- Fixed a bug in
plain_text. (#126) - Fixed another bug in parsing tables that end without a
|}mark. (#125)
- Fixed bug in parsing tables that end without a
|}mark. (#124)
- Fixed: regression in
plain_textnot being able to handle wikilinks only containing fragment/anchor, not title.
plain_textmethod now uses a more accurate image-detection algorithm.
- Fixed and improved handling of tables and images in
plain_text(#122)
- Added:
top_levels_onlyargument toget_sections. - Deprecated: Calling
get_sectionswith positional arguments is now deprecated.
- Fixed some bugs in
plain_textmethod. (#119, #120) - Fixed bug in
get_tags. (#121)
- Fixed a bug in
WikiText.external_linksnot detecting external links inserted via overwriting a template string. (#74) - The following already deprecated functions/parameters are removed:
- Setting
Parameter.defaulttoNoneis not possible anymore. Usedel Parameter.defaultinstead. - The default value for
preserve_spacingparameter ofTemplate.set_argis nowFalse. (It was deprecated to call this method without providing a value forpreserve_spacing) - The
patternparameter ofWikiList.sublists,WikiList.get_listsandWikiText.get_listscannot beNoneanymore. Use the default value instead. WikiText.lists`andWikiText.tagsare removed. Useget_lists`orget_tagsinstead.
- Setting
- Fixed a bug in
plain_text()/remove_markup, not being able to handle table with row/colspan. (#116) plain_text()will now include table captions.
- Fixed a syntax error for Python < 3.10.
- BREAKING CHANGE: dropping Python 3.6 support.
- Fixed error in getting
plain_text()of emptied-out wikitext (#113) - Deprecated: Calling
Template.set_arg()without specifying a value forpreserve_spacingparameter is deprecated. This is a temporary warning in preparation for changing the default value of this parameter fromTruetoFalse. (#111) - Fixed the
stacklevelof warnings. - New feature:
plain_text()replaces wiki-tables with a TSV string. (#115)
- Fixed a bug in detecting reverse pipe tricks as wikilinks.
- Fixed a bug in
WikiText.external_linkscausing external links within extension tags (e.g. ref tag) not to be detected when tag is inside a template/parser function/parameter. (#110)
WikiText.get_listsnow correctly detects lists with a missing level (#70)WikiList.sublistsare now returned in sorted order.
- Fixed a bug in
WikiText.pformatwhich used to causeIndexErroron a parser function which had no argument, e.g. for{{FULLPAGENAMEE}}.
- Feat:
Tableobjects now haverow_attrsproperty. - Fixed: Infinite loop on parsing tables containing
\r. (this is just to prevent infinite loop, CRLF line endings are not supported)
- Fixed: Handle empty tables instead of raising IndexError. (#107)
- Fixed an issue in handling of / in tags. (#108)
- Fixed a false-positive detection of invalid external links. (#109)
- Fixed an issue in
Template.normal_name()causing IndexError on empty/invalid template names, e.g.{{Template:}}. (#105)
- Fixed a bug in
plain_text/remove_markupcausing duplicate values when replacing nested templates.
- Feature:
replace_templatesandreplace_parser_functionsparameters ofplain_text/remove_markupnow accept a function mappingTemplateorParserFuctionobjects to desired replacement string. (#103)
- Fixed a bug in
Tag.parsed_contentsmethod. (#102)
- Fixed a bug in
plain_textmethod. (#101)
- Fixed a bug in
pformatandplain_textmethods. (#100)
- BREAKING: dropped support for Python 3.5
- Fixed: bug in handling of external links with uppercase scheme. (#99)
- Fix missing tables rows after comments (#98)
- Fixed: Templates titles cannot include wikilinks
- Fixed: Detection of tags withing WikiLinks (#96)
- Fixed a bug in
Template.set_argcausing duplicate values. (#97)
- Fixed problem in detecting extension tags with uppercase letters in their names (#95)
- Fixed regex requirement for Python 3.5 on Windows platform.
- Fixed handling of external links within definition lists. (#91)
- Fixed a bug in
plain_textmethod, not handling self-closing tags correctly.
- Fixed a bug that was causing the parser to hang when parsing complicated nested tags.
- Fixed the order of items in
WikiList.fullitems. (#72) - Fixed and improved a few edge cases in
Table.caption. (pr #81) - Fixed handling of external links within definition lists. (pr #83)
- Fixed a bug in parsing extension tags. (#90)
- MW variables are now recognized recognized as parser functions, not templates. (#69)
- Fixed a bug in mutation of root element when a child was mutated. (#66)
- Fixed a bug that was causing templates like
{{NAMESPACE|2}}to be detected as a parser function. It is a template if the first argument starts with a:. - Fixed bugs in detecting attributes of table cells. (#71, #73)
- Fixed a bug in detecting header cells in tables. (#77)
- Fixed a bug in
get_tagswhere extension tags without attributes were not returned. (#84) - Fixed a bug in
get_tablesmethod where tables within tag extensions were not recognized (#85)
- Fixed a bug in detection parser functions without parameters.
{{NAMESPACE}}used to be detected as template, but{{NAMESPACE:MediaWiki}}a parser function. Now both of them will be detected parser functions.
- Fix a bug in detecting external links within extension tags. (#65)
- Fix a few bugs
plain_text/remove_markup. (#65)
- Detect unclosed comments, e.g.
<!== a. - Fix parsing priority of tag extensions and comments. For example the comment in
<ref>b<!--c</ref>d-->used to be parsed as with<!--c</ref>d-->as comment which was incorrect.
- Fixed a catastrophic backtracking issue in parsing nested extension tags. (#60)
- Fixed a bug in
Bold.textandItalic.text, failing to parse objects containing\n. (#61)
- Fixed a bug in parsing tags containing the
<character. (#58) - Updated the list of known extension tags.
- Improved detection of nested tag extensions, e.g. a
<ref>tag within<references>.
- Fixed a bug in
get_bolds_and_italicscausing it to return duplicate items in some situations. This was also causing an error inplain_textmethod. (#57)
- Fixed bug in matching header cells in
Table.cells. (#53) - Add
Cell.is_headerproperty.
- Fixed a bug in detection of
Table.captionandTable.caption_attrs.
- Improve the performance of
get_bolds_and_italics(recursive=True, filter_cls=None). - Fix a bug in
get_bolds_and_italics(recursive=False, filter_cls=None)which was causing it to return recursive Bold items.
- Remove the deprecated parameters of
Template.normal_name(). - Fix a bug in
get_bolds_and_italics()which was causing it to return onlyBolditems.
- Fix a bug in handling of comments in template names. (#54)
- Improve the handling of weird
colspanandrowspanvalues in tables. (#53)
- Fix a syntax error in Python 3.5.
- BREAKING CHANGE:
- Remove
replace_bolds/replace_italicsparams fromremove_markup/plain_textmethods. Users can use the newreplace_bolds_and_italicsparameter. Removing only bolds or only italics is no longer possible.
- Add
get_bolds_and_italicsas a new method. - Fixed bugs and rewrote the algorithm for finding
BoldandItalicobjects. (#51)
- Trying to mutate an overwritten/detached object will now raise
DeadIndexError(a subclass ofTypeError). Hopefully this will prevent some subtle late-appearing bugs.
- Fix a bug in
plaintextmethod.
- Fix a bug in detection of external links in parsable tag extensions. (#50)
- Fix a bug in handling of half-marked bold/italic, e.g.
'''bold\n.
- Fix a bug handling of half-marked bold/italic items e.g.
'''bold text\n.
- Improve handling of extension tags inside external links. (#49)
- Ignore invalid attributes that do not start with space characters. (#48)
- Improved how invalid attributes (in html tags, tables, etc.) are handled. (#47)
- Fixed a bug in handling
<pre>tags. (#46)
- Fixed a bug in parsing tag attributes. (#44)
- Fixed handling of tags having different casings in start and end name, e.g.
<s></S>. - Fix handling of extension tags.
- Fixed a bug in
get_bolds/get_italicsresulting in duplicate items in returned values. It also was causing a subtle issue inplain_text/remove_markup, too. (#42) - Fixed detection of parameters containing single braces.
- Fix handling of external links containing wikilinks.
- Fixed a bug in
plain_text/remove_markupcausing unexpectedly empty objects. (#40)
Fixed some other bugs in
plain_text/remove_markupfunctions for:- images containing wikitext
- tags containing bold/italic items
- nested tags
Fixed a bug in extracting sub-tags.
- Fixed a bug in Tag objects causing strange behaviour upon mutating a tag.
- Fixed a bug in
plain_text/remove_markupfunctions, causing some objects that are expected to be removed, remain in the result. (#39)
- Fix syntax errors for python 3.5, 3.6, and 3.7.
- Fix a bug in getting the parser functions of a Template object.
- Fix a catastrophic backtracking issue for wikitexts containing html tags. (#37)
- Add
wikitextparser.remove_markupfunction andWikiText.plain_textmethod. - Improve detection of parameters and wikilinks.
- Add
get_boldsandget_italicsmethods. WikiLink.wikilinks,WikiList.get_lists(),Template.templates,Tag.get_tags(),ParserFunction.parser_functions, andParameter.parameterswon't return objects equal toselfanymore, only sub-elements will be returned.- Improve handling of comments within wikilinks.
WikiLink.text.setterno longer accepts None values. This was marked as deprecated since v0.25.0.- Drop support for Python 3.4.
- Remove the deprecated
pprintmethod. Users should usepformatinstead. - Allow a tuple of patterns in
get_listandsublistsmethod. The defaultNoneis now deprecated and a tuple is used instead.
- Add a new parameter,
level, for theget_sectionsmethod.
- Fixed a rare bug in handling lists and template arguments when there is newline or a pipe inside a starting or closing tag.
Section.titlewill return None instead of''when the section does not have any title.
- Invoking the deleter of
Section.titlewon't raise a RuntimeError anymore if the section does not have a title already.
- Add a deleter for
Section.titleproperty. (#32)
- Fixed a bug in
WikiText.get_lists()which was causing it to sometimes return items in an unordered fashion. (#31)
- Rename
WikiText.lists()method toWikiText.get_lists()and deprecate the old name. - Add
get_sections()method withinclude_subsectionsparameter which allows getting section without including subsections. (#23)
- Fixed a bug in parsing wikilinks contianing
[.*](#29) - Fixed: wikilinks are not allowed to be preceded by
[anymore. - Rename
WikiText.tags()method toWikiText.get_tags()and deprecate the old name.
- Fix a bug in detecting the end-tag of two consecutive same-name tags. (#27)
- Properly exclude the
testpackage from the source distribution.
- Fix a regression in parsing some corner cases of nested templates. (#26)
- The previously deprecated
WikiText.__getitem__now raises NotImplementedError. - WikiText.__call__: Remove the deprecated support for start is None.
- Optimize a little and use more robust algorithms.
- Implemented a workaround for a catastrophic backtracking condition when parsing tables. (#22)
- Add
get_tablesas a new method toWikiTextobjects. It allows extracting tables in a non-recursive manner. - The
nesting_levelproperty was only meaningful for tables, templates, and parser functions, remove it from other types.
- Fix a bug in detecting nested tables. (#21)
- Fix a few bug in detecting tables and template arguments.
- Changed the
commentsproperty ofCommentobjects to return an empty list. - Changed the
external_linksproperty ofExternalLinkobjects to return an empty list.
- Fix a bug in setting
Section.contentswhich only occurred when the title had trailing whitespace. - Setting
Section.levelwill not overwriteSection.titleanymore.
- Define
WikiLink.titleproperty. It is similar toWikiLink.targetbut will not include the#fragment.
- Deprecate using None as the start value of
__call__.
- Added fragment property to
WikiLinkclass (#18) - Added deleter method for
WikiLink.textproperty. - Deprecated: Setting
WikiLink.texttoNone. Usedel WikiLink.textinstead. - Added deleter method for
WikiLink.targetproperty. - Added deleter method for
ExternalLink.textproperty. - Added deleter method for
Parameter.defaultproperty. - Deprecated: Setting
Parameter.defaulttoNone. Usedel Parameter.defaultinstead. - Defined
WikiText.__call__to get a slice of wikitext as string. - Deprecated
WikiText.__getitem__. UseWikiText.__call__orWikiText.stringinstead.
- Fixed a bug in
Tag.parsed_contents. (#19)
- Fixed a rarely occurring bug in detecting parameters with names consisting only of whitespace or underscores.
- Fixed a bug in detecting parser functions containing parameters.
- Fixed a bug in detecting table header cells that start with +, -, or }. (#17)
- Define deleter method for
WikiText.stringproperty and addTemplate.del_argmethod. (#14) - Improve the
listsmethod ofTemplateandParserFunctionclasses. (#15) - Fixed a bug in detection of multiline arguments. (#13)
- Deprecated
capital_linksparameter ofTemplate.normal_name. Usecapitalizeinstead (keyword-only argument). - Deprecated the
codeparameter ofTemplate.normal_nameas a positional argument deprecate. It's now a keyword-only argument.
- Fixed a bug in
Sectionobjects that was causing them to return the properties of the whole page (#15). - Removed the deprecated attribute access methods.
The following deprecated methods accessible on
TableandTagobjects, have been removed:.has,.get,.set. Use.has_attr,.get_attr,.set_attrinstead. - Fixed a bug in
set_attrmethod. - Removed the deprecated
Table.getdatamethod. UseTable.datainstead. - Removed the deprecated
Table.getrdata(row_num)method. UseTable.data(row=row_num)instead. - Removed the deprecated
Table.getcdata(col_num)method. UseTable.data(col=col_num)instead. - Removed the deprecated
Table.table_attrsproperty. UseTable.attrsor other attribute-related methods instead.
- Fixed MemoryError caused by very long or unclosed comment tags (issue #12)
- Change the behaviour of external_links property to never return Templates or parser functions as part of the external link.
- Add support for literal IPv6 external links, e.g. https://[2001:db8:85a3:8d3:1319:8a2e:370:7348]:443/.
- Fixed: Do not mistake the equal signs of section titles for template keyword arguments.
- Fixed Invalid escape sequences for Python 3.6.
- Added
msg,msgnw,raw,safesubst, andsubstto known parser function identifiers.
- Fixed a bug in Table.data (issue #9)
- Fixed: A bug in processing
Sectionobjects.
- Fixed: A bug in
external_links(the starting position must now be a word boundary; previously this condition was not checked)
- Fixed: A bug in
external_links(external links withing sub-templates are now detected correctly; previously they were ignored)
- Changed: The order of results, now everything is sorted by its starting position.
- Fixed: Bug in
ancestorsandparentmethods
- Added:
parentandancestorsmethods - Added:
__version__to__init__.py
- Removed: Support for Python 3.3
- Fixed: Handling of comments and tags in section titles
- Changed: Add an underscore prefix to private internal modules names
- Changed: Moved test modules to a different directory
- Changed: Templates adjacent to external links are now treated as part of the link
- Fixed: A bug in handling tag extensions withing parser functions
- Fixed: A minor bug in Template.set_arg
- Changed: ExternalLink.text: Return None if the link is not within brackets
- Fixed: Handling of comments and templates in external links