Commit Graph

223 Commits

Author SHA1 Message Date
Y5 d3e79d75d1
gh-124130: Notes on empty string corner case of category `\B` (#124133)
Signed-off-by: y5c4l3 <y5c4l3@proton.me>
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
2024-09-23 08:58:14 +02:00
Nice Zombies 22fdb8cf89
gh-118508: Clarify which characters are matched by `\s` (#119155)
Clarify re syntax
2024-09-02 07:48:15 -04:00
Serhiy Storchaka a2f6f7dd26
gh-111259: Document idiomatic RE pattern (?s:.) that matches any character (GH-120745) 2024-06-21 00:03:49 +03:00
Awbert a86e6255c3
gh-119960: Add information about regex flags in re module functions (#119978) 2024-06-19 09:42:01 +00:00
Ned Batchelder bcb435ee8f
docs: module page titles should not start with a link to themselves (#117099) 2024-05-08 20:34:40 +01:00
Hugo van Kemenade 3375282bb8
Docs: add link roles with Sphinx extlinks (#117850)
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
2024-04-15 21:22:00 +03:00
Stevoisiak d2d7808853
gh-101699: Explain using Match.expand with \g<0> (GH-101701)
Update documentation for re library to explain that a backreference `\g<0>` is
expanded to the entire string when using Match.expand().
Note that numeric backreferences to group 0 (`\0`) are not supported.

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
2024-02-17 08:33:28 +00:00
Serhiy Storchaka 573acb30f2
gh-115172: Fix explicit index extries for the C API (GH-115173) 2024-02-11 12:23:30 +02:00
David H. Gutteridge 567a85e9c1
gh-114332: Fix the flags reference for ``re.compile()`` (#114334)
The GH-93000 change set inadvertently caused a sentence in re.compile()
documentation to refer to details that no longer followed. Correct this
with a link to the Flags sub-subsection.

Co-authored-by: Adam Turner <9087854+aa-turner@users.noreply.github.com>
2024-01-20 11:17:41 +00:00
Adam Turner c9b8a22f34
GH-107678: Improve Unicode handling clarity in ``library/re.rst`` (#107679) 2024-01-11 23:56:10 +00:00
achhina a01022af23
GH-83162: Rename re.error for better clarity. (#101677)
Renamed re.error for clarity, and kept re.error for backward compatibility.
Updated idlelib files at TJR's request.
---------

Co-authored-by: Matthias Bussonnier <mbussonnier@ucmerced.edu>
Co-authored-by: Hugo van Kemenade <hugovk@users.noreply.github.com>
Co-authored-by: Terry Jan Reedy <tjreedy@udel.edu>
2023-12-11 15:45:08 -05:00
Ezio Melotti bb7923f556
gh-110631: Fix reST indentation in `Doc/library` (#110685)
Fix wrong indentation in the Doc/library dir.
2023-10-11 22:24:12 +02:00
Serhiy Storchaka 92af0cc580
gh-109634: Use :samp: role (GH-109635) 2023-09-23 09:31:20 +03:00
Philipp A 6895ddf6cb
gh-102211: Document `re.{Pattern,Match}`’s existence (#102212)
Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
2023-08-25 10:53:11 -06:00
Serhiy Storchaka 882cb79afa
gh-56166: Deprecate passing confusing positional arguments in re functions (#107778)
Deprecate passing optional arguments maxsplit, count and flags in
module-level functions re.split(), re.sub() and re.subn() as positional.
They should only be passed by keyword.
2023-08-16 13:35:35 -07:00
wulmer 0af247da09
gh-102111: Add link to string escape sequences in re module (#106995)
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
2023-07-23 02:50:38 -06:00
wulmer 149748ea4f
Fix Sphinx warnings in `re` module docs (#107044) 2023-07-22 16:44:44 +01:00
Skip Montanaro bcadcde712
gh-102259: Fix re doc issue regarding right square brackets (#102264)
Co-authored-by: Terry Jan Reedy <tjreedy@udel.edu>
2023-02-25 21:22:16 -05:00
Ilya Kulakov dbc1e696eb
gh-99308: Clarify re docs for byte pattern group names (#99311) 2022-12-25 12:25:27 +05:30
Stanley 36a0b1d0dd
gh-69929: re docs: Add more specific definition of \w (#92015)
Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
2022-12-19 19:07:31 -08:00
Stanley 286e3c76a9
gh-99087: Add missing newline for prompts in docs (GH-98993)
Add newline for prompts so copying to REPL does not cause errors.
2022-12-08 19:31:19 -08:00
ram vikram singh e0f91deb59
GH-98906 ```re``` module: ```search() vs. match()``` section should mention ```fullmatch()``` (GH-98916)
Mention fullmatch along with search and match.
2022-11-30 17:52:21 -05:00
Victor Stinner a60ddd31be
gh-98401: Invalid escape sequences emits SyntaxWarning (#99011)
A backslash-character pair that is not a valid escape sequence now
generates a SyntaxWarning, instead of DeprecationWarning.  For
example, re.compile("\d+\.\d+") now emits a SyntaxWarning ("\d" is an
invalid escape sequence), use raw strings for regular expression:
re.compile(r"\d+\.\d+"). In a future Python version, SyntaxError will
eventually be raised, instead of SyntaxWarning.

Octal escapes with value larger than 0o377 (ex: "\477"), deprecated
in Python 3.11, now produce a SyntaxWarning, instead of
DeprecationWarning. In a future Python version they will be
eventually a SyntaxError.

codecs.escape_decode() and codecs.unicode_escape_decode() are left
unchanged: they still emit DeprecationWarning.

* The parser only emits SyntaxWarning for Python 3.12 (feature
  version), and still emits DeprecationWarning on older Python
  versions.
* Fix SyntaxWarning by using raw strings in Tools/c-analyzer/ and
  wasm_build.py.
2022-11-03 17:53:25 +01:00
Serhiy Storchaka e9ac890c02
gh-98740: Fix validation of conditional expressions in RE (GH-98764)
In very rare circumstances the JUMP opcode could be confused with the
argument of the opcode in the "then" part which doesn't end with the
JUMP opcode. This led to incorrect detection of the final JUMP opcode
and incorrect calculation of the size of the subexpression.

NOTE: Changed return value of functions _validate_inner() and
_validate_charset() in Modules/_sre/sre.c.  Now they return 0 on success,
-1 on failure, and 1 if the last op is JUMP (which usually is a failure).
Previously they returned 1 on success and 0 on failure.
2022-11-03 09:23:46 +02:00
Athos Ribeiro 0ceafa7fa4
Add re.VERBOSE flag documentation example (#97678)
The current re.VERBOSE documentation example leaves space for ambiguous
interpretation. One may read that spaces within the `(?:` token are
spaces inside the non-capturing group (such as `(?: )`). This patch
removes the ambiguity by including examples after the statement.
2022-10-04 17:39:42 -07:00
Baptiste Mispelon 642d1fa81f
gh-92727: Add example of named group in doc for re.Match.__getitem__ (#92730) 2022-05-28 13:11:08 -05:00
Stanley b7a6610bc8
gh-73137: Added sub-subsection headers for flags in re (#93000)
Fixes #73137
2022-05-22 18:52:17 -07:00
谭九鼎 bd30461298
re docs: fix source code link (#92819) 2022-05-16 17:04:17 -07:00
Serhiy Storchaka a84a56d80f
gh-91760: More strict rules for numerical group references and group names in RE (GH-91792)
Only sequence of ASCII digits is now accepted as a numerical reference.
The group name in bytes patterns and replacement strings can now only
contain ASCII letters and digits and underscore.
2022-05-08 19:19:29 +03:00
Serhiy Storchaka 19dca04121
gh-91760: Deprecate group names and numbers which will be invalid in future (GH-91794)
Only sequence of ASCII digits will be accepted as a numerical reference.
The group name in bytes patterns and replacement strings could only
contain ASCII letters and digits and underscore.
2022-04-30 13:13:46 +03:00
谭九鼎 faa12088c1
chore/docs: fix rst style and typo (GH-32331)
Current:

![图片](https://user-images.githubusercontent.com/24759802/161704413-30fc91e8-ccd1-4617-8483-bc54ec970f30.png)

After this change:

![图片](https://user-images.githubusercontent.com/24759802/161704636-a5458192-a93a-40af-8bde-90ba80fdb53f.png)

Trivial so I don't think it needs news or issue

Automerge-Triggered-By: GH:JulienPalard
2022-04-05 02:08:00 -07:00
Serhiy Storchaka c6cd3cc93c
bpo-47081: Replace "qualifiers" with "quantifiers" in the re module documentation (GH-32028)
It is a more commonly used term.
2022-03-22 11:44:47 +02:00
Serhiy Storchaka 345b390ed6
bpo-433030: Add support of atomic grouping in regular expressions (GH-31982)
* Atomic grouping: (?>...).
* Possessive quantifiers: x++, x*+, x?+, x{m,n}+.
  Equivalent to (?>x+), (?>x*), (?>x?), (?>x{m,n}).

Co-authored-by: Jeffrey C. Jacobs <timehorse@users.sourceforge.net>
2022-03-21 18:28:22 +02:00
Serhiy Storchaka 92a6abf72e
bpo-47066: Convert a warning about flags not at the start of the regular expression into error (GH-31994) 2022-03-19 16:10:44 +02:00
andrei kulakov fea7290a0e
bpo-31369: include ``RegexFlag`` in ``re.__all__`` (GH-30279)
* added RegexFlag to re.__all__; added RegexFlag.NOFLAG

Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
2022-02-04 19:54:28 -08:00
Rim Chatti dbd62e74da
Fix the "Finding all Adverbs" example (GH-21420)
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
2021-10-09 21:46:56 +03:00
Serhiy Storchaka 64f9e7b19d
bpo-44940: Clarify the documentation of re.findall() (GH-27849)
Co-authored-by: Ken Jin <28750310+Fidget-Spinner@users.noreply.github.com>
Co-authored-by: Vedran Čačić <vedgar+github@gmail.com>
2021-08-22 10:24:20 +03:00
Noah Kantrowitz be42c06bb0
Update URLs in comments and metadata to use HTTPS (GH-27458) 2021-07-30 15:54:46 +02:00
Raymond Hettinger bf1a81258c
Minor modernization and readability improvement to the tokenizer example (GH-19558) 2020-04-16 19:54:13 -07:00
Ricardo Bánffy 15ae75d660 bpo-38294: Add list of no-longer-escaped chars to re.escape documentation. (GH-16442)
Prior to 3.7, re.escape escaped many characters that don't have
special meaning in Python, but that use to require escaping in other
tools and languages. This commit aims to make it clear which characters
were, but are no longer escaped.
2019-10-07 23:54:35 +03:00
Julien Palard 1fae844451 Doc: Fix missing negation. (GH-14640)
Reported by Hug Capella on docs@.



Automerge-Triggered-By: @matrixise
2019-09-11 08:55:22 -07:00
Robert DiPietro fb6c1f8d3b Fix typo in re.escape documentation (GH-14722) 2019-07-13 16:35:04 +08:00
mollison 5ebfa840a1 bpo-36645: Fix ambiguous formatting in re.sub() documentation (GH-12879) 2019-04-22 01:14:45 +03:00
Serhiy Storchaka a180b007d9
bpo-28450: Fix and improve the documentation for unknown escapes in RE. (GH-11920) 2019-02-25 17:58:30 +02:00
animalize 4a7f44a2ed bpo-34294: re module, fix wrong capturing groups in rare cases. (GH-11546)
Need to reset capturing groups between two SRE(match) callings in loops, this fixes wrong capturing groups in rare cases.

Also add a missing index in re.rst.
2019-02-18 15:26:37 +02:00
Pablo Galindo e8239b8e81
Add information about DeprecationWarning for invalid escaped characters in the re module (GH-5255) 2019-01-20 18:57:56 +00:00
Raymond Hettinger b83942c755 Cleanup and improve the regex tokenizer example. (GH-10426)
1) Convert weird field name "typ" to the more standard "type".
2) For the NUMBER type, convert the value to an int() or float().
3) Simplify ``group(kind)`` to the shorter and faster ``group()`` call.
4) Simplify logic go a single if-elif chain to make this easier to extend.
5) Reorder the tests to match the order the tokens are specified.
   This isn't necessary for correctness but does make the example
   easier to follow.
6) Move the "column" calculation before the if-elif chain so that
   users have the option of using this value in error messages.
2018-11-09 01:19:33 -08:00
Serhiy Storchaka 913876d824
bpo-35054: Add yet more index entries for symbols. (GH-10121) 2018-10-28 13:41:26 +02:00
Serhiy Storchaka ddb961d2ab
bpo-35054: Add more index entries for symbols. (GH-10064) 2018-10-26 09:00:49 +03:00
Stéphane Wirtel 859c068e52 bpo-34962: make doctest in Doc/ now passes, and is enforced in CI (GH-9806) 2018-10-12 09:51:05 +02:00