M.-A. Lemburg <mal@lemburg.com>:

Updated to version 1.4.
2000-04-13 14:12:38 +00:00 · 2000-04-13 14:12:38 +00:00 · 10dfd4c1c3
parent e0243e24be
commit 10dfd4c1c3
1 changed files with 73 additions and 7 deletions
--- a/Misc/unicode.txt
+++ b/Misc/unicode.txt
@ -1,5 +1,5 @@
 =============================================================================
- Python Unicode Integration                            Proposal Version: 1.3
+ Python Unicode Integration                            Proposal Version: 1.4
 -----------------------------------------------------------------------------


@ -162,6 +162,17 @@ encoding>.
 For the same reason, Unicode objects should return the same hash value
 as their UTF-8 equivalent strings.

+When compared using cmp() (or PyObject_Compare()) the implementation
+should mask TypeErrors raised during the conversion to remain in synch
+with the string behavior. All other errors such as ValueErrors raised
+during coercion of strings to Unicode should not be masked and passed
+through to the user.
+
+In containment tests ('a' in u'abc' and u'a' in 'abc') both sides
+should be coerced to Unicode before applying the test. Errors occuring
+during coercion (e.g. None in u'abc') should not be masked.
+
+
 Coercion:
 ---------

@ -380,6 +391,13 @@ class StreamWriter(Codec):
        data, consumed = self.encode(object,self.errors)
        self.stream.write(data)
        
+    def writelines(self, list):
+
+        """ Writes the concatenated list of strings to the stream
+            using .write().
+        """
+        self.write(''.join(list))
+        
    def reset(self):

        """ Flushes and resets the codec buffers used for keeping state.
@ -463,6 +481,47 @@ class StreamReader(Codec):
            else:
                return object

+    def readline(self, size=None):
+
+        """ Read one line from the input stream and return the
+            decoded data.
+
+            Note: Unlike the .readlines() method, this method inherits
+            the line breaking knowledge from the underlying stream's
+            .readline() method -- there is currently no support for
+            line breaking using the codec decoder due to lack of line
+            buffering. Sublcasses should however, if possible, try to
+            implement this method using their own knowledge of line
+            breaking.
+
+            size, if given, is passed as size argument to the stream's
+            .readline() method.
+            
+        """
+        if size is None:
+            line = self.stream.readline()
+        else:
+            line = self.stream.readline(size)
+        return self.decode(line)[0]
+
+    def readlines(self, sizehint=0):
+
+        """ Read all lines available on the input stream
+            and return them as list of lines.
+
+            Line breaks are implemented using the codec's decoder
+            method and are included in the list entries.
+            
+            sizehint, if given, is passed as size argument to the
+            stream's .read() method.
+
+        """
+        if sizehint is None:
+            data = self.stream.read()
+        else:
+            data = self.stream.read(sizehint)
+        return self.decode(data)[0].splitlines(1)
+
    def reset(self):

        """ Resets the codec buffers used for keeping state.
@ -482,9 +541,6 @@ class StreamReader(Codec):
        """
        return getattr(self.stream,name)

-XXX What about .readline(), .readlines() ? These could be implemented
-    using .read() as generic functions instead of requiring their
-    implementation by all codecs. Also see Line Breaks.

 Stream codec implementors are free to combine the StreamWriter and
 StreamReader interfaces into one class. Even combining all these with
@ -692,9 +748,10 @@ Format markers are used in Python format strings. If Python strings
 are used as format strings, the following interpretations should be in
 effect:

-  '%s':                 '%s' does str(u) for Unicode objects embedded
-                        in Python strings, so the output will be
-                        u.encode(<default encoding>)
+  '%s':                 For Unicode objects this will cause coercion of the
+			whole format string to Unicode. Note that
+			you should use a Unicode format string to start
+			with for performance reasons.

 In case the format string is an Unicode object, all parameters are coerced
 to Unicode first and then put together and formatted according to the format
@ -922,6 +979,9 @@ For comparison:
 	Introducing Unicode to ECMAScript --
 	http://www-4.ibm.com/software/developer/library/internationalization-support.html

+IANA Character Set Names:
+	ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets
+
 Encodings:

    Overview:
@ -944,6 +1004,12 @@ Encodings:

 History of this Proposal:
 -------------------------
+1.4: Added note about mixed type comparisons and contains tests.
+     Changed treating of Unicode objects in format strings (if used
+     with '%s' % u they will now cause the format string to be
+     coerced to Unicode, thus producing a Unicode object on return).
+     Added link to IANA charset names (thanks to Lars Marius Garshol).
+     Added new codec methods .readline(), .readlines() and .writelines().
 1.3: Added new "es" and "es#" parser markers
 1.2: Removed POD about codecs.open()
 1.1: Added note about comparisons and hash values. Added note about