get_param(): Update the docstring to explain how CHARSET and LANGUAGE

can be None, and what to do in that situation.

get_filename(), get_boundary(), get_content_charset(): Make sure these
handle RFC 2231 headers without a CHARSET field.

Backport candidate (as was the Utils.py 1.25 change) to both Python
2.3.1 and 2.2.4 -- will do momentarily.
This commit is contained in:
Barry Warsaw 2003-08-19 03:53:02 +00:00
parent 0b6f0d8810
commit 6208369ff3
1 changed files with 12 additions and 7 deletions

View File

@ -571,13 +571,16 @@ class Message:
Parameter keys are always compared case insensitively. The return Parameter keys are always compared case insensitively. The return
value can either be a string, or a 3-tuple if the parameter was RFC value can either be a string, or a 3-tuple if the parameter was RFC
2231 encoded. When it's a 3-tuple, the elements of the value are of 2231 encoded. When it's a 3-tuple, the elements of the value are of
the form (CHARSET, LANGUAGE, VALUE), where LANGUAGE may be the empty the form (CHARSET, LANGUAGE, VALUE). Note that both CHARSET and
string. Your application should be prepared to deal with these, and LANGUAGE can be None, in which case you should consider VALUE to be
can convert the parameter to a Unicode string like so: encoded in the us-ascii charset. You can usually ignore LANGUAGE.
Your application should be prepared to deal with 3-tuple return
values, and can convert the parameter to a Unicode string like so:
param = msg.get_param('foo') param = msg.get_param('foo')
if isinstance(param, tuple): if isinstance(param, tuple):
param = unicode(param[2], param[0]) param = unicode(param[2], param[0] or 'us-ascii')
In any case, the parameter value (either the returned string, or the In any case, the parameter value (either the returned string, or the
VALUE item in the 3-tuple) is always unquoted, unless unquote is set VALUE item in the 3-tuple) is always unquoted, unless unquote is set
@ -708,7 +711,7 @@ class Message:
if isinstance(filename, TupleType): if isinstance(filename, TupleType):
# It's an RFC 2231 encoded parameter # It's an RFC 2231 encoded parameter
newvalue = _unquotevalue(filename) newvalue = _unquotevalue(filename)
return unicode(newvalue[2], newvalue[0]) return unicode(newvalue[2], newvalue[0] or 'us-ascii')
else: else:
newvalue = _unquotevalue(filename.strip()) newvalue = _unquotevalue(filename.strip())
return newvalue return newvalue
@ -725,7 +728,8 @@ class Message:
return failobj return failobj
if isinstance(boundary, TupleType): if isinstance(boundary, TupleType):
# RFC 2231 encoded, so decode. It better end up as ascii # RFC 2231 encoded, so decode. It better end up as ascii
return unicode(boundary[2], boundary[0]).encode('us-ascii') charset = boundary[0] or 'us-ascii'
return unicode(boundary[2], charset).encode('us-ascii')
return _unquotevalue(boundary.strip()) return _unquotevalue(boundary.strip())
def set_boundary(self, boundary): def set_boundary(self, boundary):
@ -792,7 +796,8 @@ class Message:
return failobj return failobj
if isinstance(charset, TupleType): if isinstance(charset, TupleType):
# RFC 2231 encoded, so decode it, and it better end up as ascii. # RFC 2231 encoded, so decode it, and it better end up as ascii.
charset = unicode(charset[2], charset[0]).encode('us-ascii') pcharset = charset[0] or 'us-ascii'
charset = unicode(charset[2], pcharset).encode('us-ascii')
# RFC 2046, $4.1.2 says charsets are not case sensitive # RFC 2046, $4.1.2 says charsets are not case sensitive
return charset.lower() return charset.lower()