349 lines
14 KiB
TeX
349 lines
14 KiB
TeX
\section{\module{csv} --- CSV File Reading and Writing}
|
|
|
|
\declaremodule{standard}{csv}
|
|
\modulesynopsis{Write and read tabular data to and from delimited files.}
|
|
\sectionauthor{Skip Montanaro}{skip@pobox.com}
|
|
|
|
\versionadded{2.3}
|
|
\index{csv}
|
|
\indexii{data}{tabular}
|
|
|
|
The so-called CSV (Comma Separated Values) format is the most common import
|
|
and export format for spreadsheets and databases. There is no ``CSV
|
|
standard'', so the format is operationally defined by the many applications
|
|
which read and write it. The lack of a standard means that subtle
|
|
differences often exist in the data produced and consumed by different
|
|
applications. These differences can make it annoying to process CSV files
|
|
from multiple sources. Still, while the delimiters and quoting characters
|
|
vary, the overall format is similar enough that it is possible to write a
|
|
single module which can efficiently manipulate such data, hiding the details
|
|
of reading and writing the data from the programmer.
|
|
|
|
The \module{csv} module implements classes to read and write tabular data in
|
|
CSV format. It allows programmers to say, ``write this data in the format
|
|
preferred by Excel,'' or ``read data from this file which was generated by
|
|
Excel,'' without knowing the precise details of the CSV format used by
|
|
Excel. Programmers can also describe the CSV formats understood by other
|
|
applications or define their own special-purpose CSV formats.
|
|
|
|
The \module{csv} module's \class{reader} and \class{writer} objects read and
|
|
write sequences. Programmers can also read and write data in dictionary
|
|
form using the \class{DictReader} and \class{DictWriter} classes.
|
|
|
|
\begin{notice}
|
|
This version of the \module{csv} module doesn't support Unicode
|
|
input. Also, there are currently some issues regarding \ASCII{} NUL
|
|
characters. Accordingly, all input should generally be printable
|
|
\ASCII{} to be safe. These restrictions will be removed in the future.
|
|
\end{notice}
|
|
|
|
\begin{seealso}
|
|
% \seemodule{array}{Arrays of uniformly types numeric values.}
|
|
\seepep{305}{CSV File API}
|
|
{The Python Enhancement Proposal which proposed this addition
|
|
to Python.}
|
|
\end{seealso}
|
|
|
|
|
|
\subsection{Module Contents \label{csv-contents}}
|
|
|
|
The \module{csv} module defines the following functions:
|
|
|
|
\begin{funcdesc}{reader}{csvfile\optional{,
|
|
dialect=\code{'excel'}\optional{, fmtparam}}}
|
|
Return a reader object which will iterate over lines in the given
|
|
{}\var{csvfile}. \var{csvfile} can be any object which supports the
|
|
iterator protocol and returns a string each time its \method{next}
|
|
method is called. If \var{csvfile} is a file object, it must be opened with
|
|
the 'b' flag on platforms where that makes a difference. An optional
|
|
{}\var{dialect} parameter can be given
|
|
which is used to define a set of parameters specific to a particular CSV
|
|
dialect. It may be an instance of a subclass of the \class{Dialect}
|
|
class or one of the strings returned by the \function{list_dialects}
|
|
function. The other optional {}\var{fmtparam} keyword arguments can be
|
|
given to override individual formatting parameters in the current
|
|
dialect. For more information about the dialect and formatting
|
|
parameters, see section~\ref{csv-fmt-params}, ``Dialects and Formatting
|
|
Parameters'' for details of these parameters.
|
|
|
|
All data read are returned as strings. No automatic data type
|
|
conversion is performed.
|
|
\end{funcdesc}
|
|
|
|
\begin{funcdesc}{writer}{csvfile\optional{,
|
|
dialect=\code{'excel'}\optional{, fmtparam}}}
|
|
Return a writer object responsible for converting the user's data into
|
|
delimited strings on the given file-like object. \var{csvfile} can be any
|
|
object with a \function{write} method. If \var{csvfile} is a file object,
|
|
it must be opened with the 'b' flag on platforms where that makes a
|
|
difference. An optional
|
|
{}\var{dialect} parameter can be given which is used to define a set of
|
|
parameters specific to a particular CSV dialect. It may be an instance
|
|
of a subclass of the \class{Dialect} class or one of the strings
|
|
returned by the \function{list_dialects} function. The other optional
|
|
{}\var{fmtparam} keyword arguments can be given to override individual
|
|
formatting parameters in the current dialect. For more information
|
|
about the dialect and formatting parameters, see
|
|
section~\ref{csv-fmt-params}, ``Dialects and Formatting Parameters'' for
|
|
details of these parameters. To make it as easy as possible to
|
|
interface with modules which implement the DB API, the value
|
|
\constant{None} is written as the empty string. While this isn't a
|
|
reversible transformation, it makes it easier to dump SQL NULL data values
|
|
to CSV files without preprocessing the data returned from a
|
|
\code{cursor.fetch*()} call. All other non-string data are stringified
|
|
with \function{str()} before being written.
|
|
\end{funcdesc}
|
|
|
|
\begin{funcdesc}{register_dialect}{name, dialect}
|
|
Associate \var{dialect} with \var{name}. \var{dialect} must be a subclass
|
|
of \class{csv.Dialect}. \var{name} must be a string or Unicode object.
|
|
\end{funcdesc}
|
|
|
|
\begin{funcdesc}{unregister_dialect}{name}
|
|
Delete the dialect associated with \var{name} from the dialect registry. An
|
|
\exception{Error} is raised if \var{name} is not a registered dialect
|
|
name.
|
|
\end{funcdesc}
|
|
|
|
\begin{funcdesc}{get_dialect}{name}
|
|
Return the dialect associated with \var{name}. An \exception{Error} is
|
|
raised if \var{name} is not a registered dialect name.
|
|
\end{funcdesc}
|
|
|
|
\begin{funcdesc}{list_dialects}{}
|
|
Return the names of all registered dialects.
|
|
\end{funcdesc}
|
|
|
|
|
|
The \module{csv} module defines the following classes:
|
|
|
|
\begin{classdesc}{DictReader}{csvfile\optional{,
|
|
fieldnames=\constant{None},\optional{,
|
|
restkey=\constant{None}\optional{,
|
|
restval=\constant{None}\optional{,
|
|
dialect=\code{'excel'}\optional{,
|
|
*args, **kwds}}}}}}
|
|
Create an object which operates like a regular reader but maps the
|
|
information read into a dict whose keys are given by the optional
|
|
{} \var{fieldnames}
|
|
parameter. If the \var{fieldnames} parameter is omitted, the values in
|
|
the first row of the \var{csvfile} will be used as the fieldnames.
|
|
If the row read has fewer fields than the fieldnames sequence,
|
|
the value of \var{restval} will be used as the default value. If the row
|
|
read has more fields than the fieldnames sequence, the remaining data is
|
|
added as a sequence keyed by the value of \var{restkey}. If the row read
|
|
has fewer fields than the fieldnames sequence, the remaining keys take the
|
|
value of the optional \var{restval} parameter. Any other optional or
|
|
keyword arguments are passed to the underlying \class{reader} instance.
|
|
\end{classdesc}
|
|
|
|
|
|
\begin{classdesc}{DictWriter}{csvfile, fieldnames\optional{,
|
|
restval=""\optional{,
|
|
extrasaction=\code{'raise'}\optional{,
|
|
dialect=\code{'excel'}\optional{,
|
|
*args, **kwds}}}}}
|
|
Create an object which operates like a regular writer but maps dictionaries
|
|
onto output rows. The \var{fieldnames} parameter identifies the order in
|
|
which values in the dictionary passed to the \method{writerow()} method are
|
|
written to the \var{csvfile}. The optional \var{restval} parameter
|
|
specifies the value to be written if the dictionary is missing a key in
|
|
\var{fieldnames}. If the dictionary passed to the \method{writerow()}
|
|
method contains a key not found in \var{fieldnames}, the optional
|
|
\var{extrasaction} parameter indicates what action to take. If it is set
|
|
to \code{'raise'} a \exception{ValueError} is raised. If it is set to
|
|
\code{'ignore'}, extra values in the dictionary are ignored. Any other
|
|
optional or keyword arguments are passed to the underlying \class{writer}
|
|
instance.
|
|
|
|
Note that unlike the \class{DictReader} class, the \var{fieldnames}
|
|
parameter of the \class{DictWriter} is not optional. Since Python's
|
|
\class{dict} objects are not ordered, there is not enough information
|
|
available to deduce the order in which the row should be written to the
|
|
\var{csvfile}.
|
|
|
|
\end{classdesc}
|
|
|
|
\begin{classdesc*}{Dialect}{}
|
|
The \class{Dialect} class is a container class relied on primarily for its
|
|
attributes, which are used to define the parameters for a specific
|
|
\class{reader} or \class{writer} instance.
|
|
\end{classdesc*}
|
|
|
|
\begin{classdesc}{excel}{}
|
|
The \class{excel} class defines the usual properties of an Excel-generated
|
|
CSV file.
|
|
\end{classdesc}
|
|
|
|
\begin{classdesc}{excel_tab}{}
|
|
The \class{excel_tab} class defines the usual properties of an
|
|
Excel-generated TAB-delimited file.
|
|
\end{classdesc}
|
|
|
|
\begin{classdesc}{Sniffer}{}
|
|
The \class{Sniffer} class is used to deduce the format of a CSV file.
|
|
\end{classdesc}
|
|
|
|
The \class{Sniffer} class provides a single method:
|
|
|
|
\begin{methoddesc}{sniff}{sample\optional{,delimiters=None}}
|
|
Analyze the given \var{sample} and return a \class{Dialect} subclass
|
|
reflecting the parameters found. If the optional \var{delimiters} parameter
|
|
is given, it is interpreted as a string containing possible valid delimiter
|
|
characters.
|
|
\end{methoddesc}
|
|
|
|
\begin{methoddesc}{has_header}{sample}
|
|
Analyze the sample text (presumed to be in CSV format) and return
|
|
\constant{True} if the first row appears to be a series of column
|
|
headers.
|
|
\end{methoddesc}
|
|
|
|
|
|
The \module{csv} module defines the following constants:
|
|
|
|
\begin{datadesc}{QUOTE_ALL}
|
|
Instructs \class{writer} objects to quote all fields.
|
|
\end{datadesc}
|
|
|
|
\begin{datadesc}{QUOTE_MINIMAL}
|
|
Instructs \class{writer} objects to only quote those fields which contain
|
|
the current \var{delimiter} or begin with the current \var{quotechar}.
|
|
\end{datadesc}
|
|
|
|
\begin{datadesc}{QUOTE_NONNUMERIC}
|
|
Instructs \class{writer} objects to quote all non-numeric fields.
|
|
\end{datadesc}
|
|
|
|
\begin{datadesc}{QUOTE_NONE}
|
|
Instructs \class{writer} objects to never quote fields. When the current
|
|
\var{delimiter} occurs in output data it is preceded by the current
|
|
\var{escapechar} character. When \constant{QUOTE_NONE} is in effect, it
|
|
is an error not to have a single-character \var{escapechar} defined, even if
|
|
no data to be written contains the \var{delimiter} character.
|
|
\end{datadesc}
|
|
|
|
|
|
The \module{csv} module defines the following exception:
|
|
|
|
\begin{excdesc}{Error}
|
|
Raised by any of the functions when an error is detected.
|
|
\end{excdesc}
|
|
|
|
|
|
\subsection{Dialects and Formatting Parameters\label{csv-fmt-params}}
|
|
|
|
To make it easier to specify the format of input and output records,
|
|
specific formatting parameters are grouped together into dialects. A
|
|
dialect is a subclass of the \class{Dialect} class having a set of specific
|
|
methods and a single \method{validate()} method. When creating \class{reader}
|
|
or \class{writer} objects, the programmer can specify a string or a subclass
|
|
of the \class{Dialect} class as the dialect parameter. In addition to, or
|
|
instead of, the \var{dialect} parameter, the programmer can also specify
|
|
individual formatting parameters, which have the same names as the
|
|
attributes defined below for the \class{Dialect} class.
|
|
|
|
Dialects support the following attributes:
|
|
|
|
\begin{memberdesc}[Dialect]{delimiter}
|
|
A one-character string used to separate fields. It defaults to \code{','}.
|
|
\end{memberdesc}
|
|
|
|
\begin{memberdesc}[Dialect]{doublequote}
|
|
Controls how instances of \var{quotechar} appearing inside a field should be
|
|
themselves be quoted. When \constant{True}, the character is doubled.
|
|
When \constant{False}, the \var{escapechar} must be a one-character string
|
|
which is used as a prefix to the \var{quotechar}. It defaults to
|
|
\constant{True}.
|
|
\end{memberdesc}
|
|
|
|
\begin{memberdesc}[Dialect]{escapechar}
|
|
A one-character string used to escape the \var{delimiter} if \var{quoting}
|
|
is set to \constant{QUOTE_NONE}. It defaults to \constant{None}.
|
|
\end{memberdesc}
|
|
|
|
\begin{memberdesc}[Dialect]{lineterminator}
|
|
The string used to terminate lines in the CSV file. It defaults to
|
|
\code{'\e r\e n'}.
|
|
\end{memberdesc}
|
|
|
|
\begin{memberdesc}[Dialect]{quotechar}
|
|
A one-character string used to quote elements containing the \var{delimiter}
|
|
or which start with the \var{quotechar}. It defaults to \code{'"'}.
|
|
\end{memberdesc}
|
|
|
|
\begin{memberdesc}[Dialect]{quoting}
|
|
Controls when quotes should be generated by the writer. It can take on any
|
|
of the \constant{QUOTE_*} constants (see section~\ref{csv-contents})
|
|
and defaults to \constant{QUOTE_MINIMAL}.
|
|
\end{memberdesc}
|
|
|
|
\begin{memberdesc}[Dialect]{skipinitialspace}
|
|
When \constant{True}, whitespace immediately following the \var{delimiter}
|
|
is ignored. The default is \constant{False}.
|
|
\end{memberdesc}
|
|
|
|
|
|
\subsection{Reader Objects}
|
|
|
|
Reader objects (\class{DictReader} instances and objects returned by
|
|
the \function{reader()} function) have the following public methods:
|
|
|
|
\begin{methoddesc}[csv reader]{next}{}
|
|
Return the next row of the reader's iterable object as a list, parsed
|
|
according to the current dialect.
|
|
\end{methoddesc}
|
|
|
|
|
|
\subsection{Writer Objects}
|
|
|
|
\class{Writer} objects (\class{DictWriter} instances and objects returned by
|
|
the \function{writer()} function) have the following public methods. A
|
|
{}\var{row} must be a sequence of strings or numbers for \class{Writer}
|
|
objects and a dictionary mapping fieldnames to strings or numbers (by
|
|
passing them through \function{str()} first) for {}\class{DictWriter}
|
|
objects. Note that complex numbers are written out surrounded by parens.
|
|
This may cause some problems for other programs which read CSV files
|
|
(assuming they support complex numbers at all).
|
|
|
|
\begin{methoddesc}[csv writer]{writerow}{row}
|
|
Write the \var{row} parameter to the writer's file object, formatted
|
|
according to the current dialect.
|
|
\end{methoddesc}
|
|
|
|
\begin{methoddesc}[csv writer]{writerows}{rows}
|
|
Write all the \var{rows} parameters (a list of \var{row} objects as
|
|
described above) to the writer's file object, formatted
|
|
according to the current dialect.
|
|
\end{methoddesc}
|
|
|
|
|
|
\subsection{Examples}
|
|
|
|
The ``Hello, world'' of csv reading is
|
|
|
|
\begin{verbatim}
|
|
import csv
|
|
reader = csv.reader(open("some.csv", "rb"))
|
|
for row in reader:
|
|
print row
|
|
\end{verbatim}
|
|
|
|
To print just the first and last columns of each row try
|
|
|
|
\begin{verbatim}
|
|
import csv
|
|
reader = csv.reader(open("some.csv", "rb"))
|
|
for row in reader:
|
|
print row[0], row[-1]
|
|
\end{verbatim}
|
|
|
|
The corresponding simplest possible writing example is
|
|
|
|
\begin{verbatim}
|
|
import csv
|
|
writer = csv.writer(open("some.csv", "wb"))
|
|
for row in someiterable:
|
|
writer.writerow(row)
|
|
\end{verbatim}
|