cpython

Commit Graph

Author	SHA1	Message	Date
Thomas Wouters	0813d76cb0	Merge in Anthony's new parser code, from the anthony-parser-branch: > ---------------------------- > revision 1.20.4.4 > date: 2003/06/12 09:14:17; author: anthonybaxter; state: Exp; lines: +13 -6 > preamble is None when missing, not ''. > Handle a couple of bogus formatted messages - now parses my main testsuite. > Handle message/external-body. > ---------------------------- > revision 1.20.4.3 > date: 2003/06/12 07:16:40; author: anthonybaxter; state: Exp; lines: +6 -4 > epilogue-processing is now the same as the old parser - the newline at the > end of the line with the --endboundary-- is included as part of the epilogue. > Note that any whitespace after the boundary is _not_ part of the epilogue. > ---------------------------- > revision 1.20.4.2 > date: 2003/06/12 06:39:09; author: anthonybaxter; state: Exp; lines: +6 -4 > message/delivery-status fixed. > HeaderParser fixed. > ---------------------------- > revision 1.20.4.1 > date: 2003/06/12 06:08:56; author: anthonybaxter; state: Exp; lines: +163 -129 > A work-in-progress snapshot of the new parser. A couple of known problems: > > - first (blank) line of MIME epilogues is being consumed > - message/delivery-status isn't quite right > > It still needs a lot of cleanup, but right now it parses a whole lot of > badness that the old parser failed on. I also need to think about adding > back the old 'strict' flag in some way. > =============================================================================	2004-03-20 17:31:29 +00:00
Barry Warsaw	0e4570bcb0	Merge of the folding-reimpl-branch. Specific changes, Rename a constant.	2003-03-06 05:25:35 +00:00
Barry Warsaw	da2525ed2a	parse(), _parseheaders(), _parsebody(): A fix for SF bug #633527 , where in lax parsing, the first non-header line after a header block (e.g. the first line not containing a colon, and not a continuation), can be treated as the first body line, even without the RFC mandated blank line separator. rfc822 had this behavior, and I vaguely remember problems with this, but can't remember details. In any event, all the tests still pass, so I guess we'll find out. ;/ This patch works by returning the non-header, non-continuation line from _parseheader() and using that as the first header line prepended to fp.read() if given. It's usually None. We use this approach instead of trying to seek/tell the file-like object.	2002-11-05 21:44:06 +00:00
Barry Warsaw	5c9130ec46	_parsebody(): A fix for SF bug #631350 , where a subobject in a multipart/digest isn't a message/rfc822. This is legal, but counter to recommended practice in RFC 2046, $5.1.5. The fix is to look at the content type after setting the default content type. If the maintype is then message or multipart, attach the parsed subobject, otherwise use set_payload() to set the data of the other object.	2002-11-05 20:54:37 +00:00
Barry Warsaw	487fe6ac39	_parsebody(): Use get_content_type() instead of the deprecated get_type(). Also, one of the regular expressions is constant so might as well make it a module global. And, when splitting up digests, handle lineseps that are longer than 1 character in length (e.g. \r\n).	2002-10-07 17:27:35 +00:00
Barry Warsaw	057b8428d0	Docstring consistency with the updated .tex files.	2002-09-30 20:07:22 +00:00
Barry Warsaw	e03e8f09eb	Use True/False everywhere.	2002-09-28 20:44:58 +00:00
Barry Warsaw	034b47acfe	_parsebody(): Instead of raising a BoundaryError when no start boundary could be found -- in a lax parser -- the entire body is assigned to the message payload.	2002-09-10 16:14:56 +00:00
Tim Peters	280488b9a3	Whitespace normalization.	2002-08-23 18:19:30 +00:00
Barry Warsaw	bb26b4530b	Parser.__init__(): The consensus on the mimelib-devel list is that non-strict parsing should be the default. Make it so.	2002-07-19 22:25:34 +00:00
Barry Warsaw	7aeac9180e	Anthony Baxter's cleanup patch. Python project SF patch # 583190, quoting: in non-strict mode, messages don't require a blank line at the end with a missing end-terminator. A single newline is sufficient now. Handle trailing whitespace at the end of a boundary. Had to switch from using string.split() to re.split() Handle whitespace on the end of a parameter list for Content-type. Handle whitespace on the end of a plain content-type header. Specifically, get_type(): Strip the content type string. _get_params_preserve(): Strip the parameter names and values on both sides. _parsebody(): Lots of changes as described above, with some stylistic changes by Barry (who hopefully didn't screw things up ;).	2002-07-18 23:09:09 +00:00
Barry Warsaw	f6caeba03a	Anthony Baxter's patch for non-strict parsing. This adds a `strict' argument to the constructor -- defaulting to true -- which is different than Anthony's approach of using global state. parse(), parsestr(): Grow a `headersonly' argument which stops parsing once the header block has been seen, i.e. it does /not/ parse or even read the body of the message. This is used for parsing message/rfc822 type messages. We need test cases for the non-strict parsing. Anthony will supply these. _parsebody(): We can get rid of the isdigest end-of-line kludges, although we still need to know if we're parsing a multipart/digest so we can set the default type accordingly.	2002-07-09 02:50:02 +00:00
Barry Warsaw	69e18af968	_parsebody(): Fix for the new message/rfc822 tree structure (the parent is now a multipart with one element, the sub-message object).	2002-06-02 19:12:03 +00:00
Barry Warsaw	7e21b6792b	I've thought about it some more, and I believe it is proper for the email package's Parser to handle the three common line endings. Certain protocols such as IMAP define CRLF line endings and it doesn't make sense for the client app to have to normalize the line endings before handing it message off to the Parser. _parsebody(): Be more flexible in the matching of line endings for finding the MIME separators. Accept any of \r, \n and \r\n. Note that we do /not/ change the line endings in the payloads, we just accept any of those three around MIME boundaries.	2002-05-19 23:51:50 +00:00
Barry Warsaw	409a4c08b5	Sync'ing with standalone email package 2.0.1. This adds support for non-us-ascii character sets in headers and bodies. Some API changes (with DeprecationWarnings for the old APIs). Better RFC-compliant implementations of base64 and quoted-printable. Updated test cases. Documentation updates to follow (after I finish writing them ;).	2002-04-10 21:01:31 +00:00
Barry Warsaw	15e9dc9eac	_parsebody(): When adding subparts to a multipart container, make sure that the first subpart added makes the payload a list object. Otherwise, a multipart/* with only one subpart will not have the proper structure.	2002-01-27 06:48:02 +00:00
Barry Warsaw	e552882960	HeaderParser: A new subclass of Parser which only parses the message headers. It does not parse the body of the message, instead simply assigning it as a string to the container's payload. This can be much faster when you're only interested in a message's header.	2001-10-11 15:43:00 +00:00
Barry Warsaw	e968ead1dd	Give me back my page breaks.	2001-10-04 17:05:11 +00:00
Tim Peters	527e64fd68	Whitespace normalization.	2001-10-04 05:36:56 +00:00
Barry Warsaw	66971fbca5	_parsebody(): Use get_boundary() and get_type(). Also, add a clause to the big-if to handle message/delivery-status content types. These create a message with subparts that are Message instances, which best represent the header blocks of this content type.	2001-09-26 05:44:09 +00:00
Barry Warsaw	ba92580f01	The email package version 1.0, prototyped as mimelib <http://sf.net/projects/mimelib>. There /are/ API differences between mimelib and email, but most of the implementations are shared (except where cool Py2.2 stuff like generators are used).	2001-09-23 03:17:28 +00:00

21 Commits