added another example of Unicode CSV parsing; reworked the example text a bit; corrected notice in the intro and added a link to the examples

This commit is contained in:
David Goodger 2006-04-04 03:05:44 +00:00
parent 5fe715f049
commit cb30f97bd3
1 changed files with 48 additions and 9 deletions

View File

@ -33,8 +33,9 @@ form using the \class{DictReader} and \class{DictWriter} classes.
\begin{notice} \begin{notice}
This version of the \module{csv} module doesn't support Unicode This version of the \module{csv} module doesn't support Unicode
input. Also, there are currently some issues regarding \ASCII{} NUL input. Also, there are currently some issues regarding \ASCII{} NUL
characters. Accordingly, all input should generally be printable characters. Accordingly, all input should be UTF-8 or printable
\ASCII{} to be safe. These restrictions will be removed in the future. \ASCII{} to be safe; see the examples in section~\ref{csv-examples}.
These restrictions will be removed in the future.
\end{notice} \end{notice}
\begin{seealso} \begin{seealso}
@ -365,7 +366,7 @@ A read-only description of the dialect in use by the writer.
\subsection{Examples} \subsection{Examples\label{csv-examples}}
The simplest example of reading a CSV file: The simplest example of reading a CSV file:
@ -426,14 +427,49 @@ for row in csv.reader(['one,two,three']):
\end{verbatim} \end{verbatim}
The \module{csv} module doesn't directly support reading and writing The \module{csv} module doesn't directly support reading and writing
Unicode, but it is 8-bit clean save for some problems with \ASCII{} NUL Unicode, but it is 8-bit-clean save for some problems with \ASCII{} NUL
characters, so you can write classes that handle the encoding and decoding characters. So you can write functions or classes that handle the
for you as long as you avoid encodings like utf-16 that use NULs: encoding and decoding for you as long as you avoid encodings like
UTF-16 that use NULs. UTF-8 is recommended.
\function{unicode_csv_reader} below is a generator that wraps
\class{csv.reader} to handle Unicode CSV data (a list of Unicode
strings). \function{utf_8_encoder} is a generator that encodes the
Unicode strings as UTF-8, one string (or row) at a time. The encoded
strings are parsed by the CSV reader, and
\function{unicode_csv_reader} decodes the UTF-8-encoded cells back
into Unicode:
\begin{verbatim}
import csv
def unicode_csv_reader(unicode_csv_data, dialect=csv.excel, **kwargs):
# csv.py doesn't do Unicode; encode temporarily as UTF-8:
csv_reader = csv.reader(utf_8_encoder(unicode_csv_data),
dialect=dialect, **kwargs)
for row in csv_reader:
# decode UTF-8 back to Unicode, cell by cell:
yield [unicode(cell, 'utf-8') for cell in row]
def utf_8_encoder(unicode_csv_data):
for line in unicode_csv_data:
yield line.encode('utf-8')
\end{verbatim}
The classes below work just like the \class{csv.reader} and
\class{csv.writer} classes, but they add an \var{encoding} parameter
to allow for encoded files:
\begin{verbatim} \begin{verbatim}
import csv import csv
class UnicodeReader: class UnicodeReader:
"""
A CSV reader which will iterate over lines in the CSV file "f",
which is encoded in the given encoding.
"""
def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds): def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
self.reader = csv.reader(f, dialect=dialect, **kwds) self.reader = csv.reader(f, dialect=dialect, **kwds)
self.encoding = encoding self.encoding = encoding
@ -446,6 +482,12 @@ class UnicodeReader:
return self return self
class UnicodeWriter: class UnicodeWriter:
"""
A CSV writer which will write rows to CSV file "f",
which is encoded in the given encoding.
"""
def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds): def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
self.writer = csv.writer(f, dialect=dialect, **kwds) self.writer = csv.writer(f, dialect=dialect, **kwds)
self.encoding = encoding self.encoding = encoding
@ -457,6 +499,3 @@ class UnicodeWriter:
for row in rows: for row in rows:
self.writerow(row) self.writerow(row)
\end{verbatim} \end{verbatim}
They should work just like the \class{csv.reader} and \class{csv.writer}
classes but add an \var{encoding} parameter.