Python kiểm tra xem tệp có phải là văn bản thuần túy không

The pandas I/O API is a set of top level

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
05 functions accessed like
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
06 that generally return a pandas object. The corresponding
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
07 functions are object methods that are accessed like
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
08. Below is a table containing available
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
09 and
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
10

Format Type

Data Description

Reader

Writer

text

CSV

read_csv

to_csv

text

Fixed-Width Text File

read_fwf

text

JSON

read_json

to_json

text

HTML

read_html

to_html

text

Mủ cao su

Styler. to_latex

text

XML

read_xml

to_xml

text

Local clipboard

read_clipboard

to_clipboard

binary

MS Excel

read_excel

to_excel

binary

OpenDocument

read_excel

binary

HDF5 Format

read_hdf

to_hdf

binary

Feather Format

read_feather

to_feather

binary

Parquet Format

read_parquet

to_parquet

binary

ORC Format

read_orc

to_orc

binary

Stata

read_stata

to_stata

binary

SAS

read_sas

binary

SPSS

read_spss

binary

Python Pickle Format

read_pickle

to_pickle

SQL

SQL

read_sql

to_sql

SQL

Google BigQuery

read_gbq

to_gbq

Here is an informal performance comparison for some of these IO methods.

Note

For examples that use the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
11 class, make sure you import it with
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
12 for Python 3

CSV & text files#

The workhorse function for reading text files [a. k. a. flat files] is

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
13. See the cookbook for some advanced strategies.

Parsing options#

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
13 accepts the following common arguments

Basic#

filepath_or_buffer various

Either a path to a file [a

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
15,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
16, or
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
17], URL [including http, ftp, and S3 locations], or any object with a
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
18 method [such as an open file or
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
11]

sep str, defaults to
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
20 for
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
13,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
22 for
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
23

Delimiter to use. If sep is

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24, the C engine cannot automatically detect the separator, but the Python parsing engine can, meaning the latter will be used and automatically detect the separator by Python’s builtin sniffer tool,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
25. In addition, separators longer than 1 character and different from
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
26 will be interpreted as regular expressions and will also force the use of the Python parsing engine. Note that regex delimiters are prone to ignoring quoted data. Ví dụ về biểu thức chính quy.
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
27

delimiter str, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24

Alternative argument name for sep

delim_whitespace boolean, default False

Specifies whether or not whitespace [e. g.

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
29 or
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
30] will be used as the delimiter. Equivalent to setting
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
31. If this option is set to
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
32, nothing should be passed in for the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
33 parameter

Column and index locations and names#

header int or list of ints, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
34

Row number[s] to use as the column names, and the start of the data. Default behavior is to infer the column names. if no names are passed the behavior is identical to

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
35 and column names are inferred from the first line of the file, if column names are passed explicitly then the behavior is identical to
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
36. Explicitly pass
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
35 to be able to replace existing names

The header can be a list of ints that specify row locations for a MultiIndex on the columns e. g.

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
38. Intervening rows that are not specified will be skipped [e. g. 2 trong ví dụ này bị bỏ qua]. Note that this parameter ignores commented lines and empty lines if
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
39, so header=0 denotes the first line of data rather than the first line of the file

names array-like, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24

List of column names to use. If file contains no header row, then you should explicitly pass

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
36. Duplicates in this list are not allowed

index_col int, str, sequence of int / str, or False, optional, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24

Column[s] to use as the row labels of the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43, either given as string name or column index. If a sequence of int / str is given, a MultiIndex is used

Note

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
44 can be used to force pandas to not use the first column as the index, e. g. when you have a malformed file with delimiters at the end of each line

The default value of

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24 instructs pandas to guess. If the number of fields in the column header row is equal to the number of fields in the body of the data file, then a default index is used. If it is larger, then the first columns are used as index so that the remaining number of fields in the body are equal to the number of fields in the header

The first row after the header is used to determine the number of columns, which will go into the index. If the subsequent rows contain less columns than the first row, they are filled with

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
46

This can be avoided through

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
47. This ensures that the columns are taken as is and the trailing data are ignored

usecols list-like or callable, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24

Return a subset of the columns. If list-like, all elements must either be positional [i. e. integer indices into the document columns] or strings that correspond to column names provided either by the user in

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
49 or inferred from the document header row[s]. If
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
49 are given, the document header row[s] are not taken into account. For example, a valid list-like
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
47 parameter would be
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
52 or
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
53

Element order is ignored, so

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
54 is the same as
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
55. To instantiate a DataFrame from
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
56 with element order preserved use
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
57 for columns in
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
58 order or
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
59 for
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
60 order

If callable, the callable function will be evaluated against the column names, returning names where the callable function evaluates to True

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
6

Using this parameter results in much faster parsing time and lower memory usage when using the c engine. The Python engine loads the data first before deciding which columns to drop

squeeze boolean, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
61

If the parsed data only contains one column then return a

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
62

Deprecated since version 1. 4. 0. Append

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
63 to the call to
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
64 to squeeze the data.

prefix str, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24

Prefix to add to column numbers when no header, e. g. ‘X’ for X0, X1, …

Không dùng nữa kể từ phiên bản 1. 4. 0. Use a list comprehension on the DataFrame’s columns after calling

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
66.

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3

mangle_dupe_cols boolean, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
32

Duplicate columns will be specified as ‘X’, ‘X. 1’…’X. N’, rather than ‘X’…’X’. Passing in

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
61 will cause data to be overwritten if there are duplicate names in the columns

Deprecated since version 1. 5. 0. The argument was never implemented, and a new argument where the renaming pattern can be specified will be added instead.

General parsing configuration#

dtype Type name or dict of column -> type, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24

Data type for data or columns. e. g.

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
70 Use
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
15 or
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
72 together with suitable
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
73 settings to preserve and not interpret dtype. If converters are specified, they will be applied INSTEAD of dtype conversion

New in version 1. 5. 0. Support for defaultdict was added. Specify a defaultdict as input where the default determines the dtype of the columns which are not explicitly listed.

engine {
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
74,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
75,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
76}

Parser engine to use. The C and pyarrow engines are faster, while the python engine is currently more feature-complete. Multithreading is currently only supported by the pyarrow engine

New in version 1. 4. 0. The “pyarrow” engine was added as an experimental engine, and some features are unsupported, or may not work correctly, with this engine.

converters dict, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24

Dict of functions for converting values in certain columns. Keys can either be integers or column labels

true_values list, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24

Values to consider as

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
32

false_values list, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24

Values to consider as

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
61

skipinitialspace boolean, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
61

Skip spaces after delimiter

skiprows list-like or integer, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24

Line numbers to skip [0-indexed] or number of lines to skip [int] at the start of the file

If callable, the callable function will be evaluated against the row indices, returning True if the row should be skipped and False otherwise

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
1

skipfooter int, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
84

Số dòng ở cuối tệp cần bỏ qua [không được hỗ trợ với engine=’c’]

nrows int, mặc định
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24

Số hàng của tập tin để đọc. Hữu ích để đọc các phần của tệp lớn

low_memory boolean, mặc định
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
32

Xử lý nội bộ tệp theo khối, dẫn đến việc sử dụng bộ nhớ thấp hơn trong khi phân tích cú pháp, nhưng có thể suy luận kiểu hỗn hợp. Để đảm bảo không có loại hỗn hợp, hãy đặt

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
61 hoặc chỉ định loại bằng tham số
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
88. Lưu ý rằng toàn bộ tệp được đọc thành một
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 duy nhất, sử dụng tham số
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
90 hoặc
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
91 để trả về dữ liệu theo khối. [Chỉ hợp lệ với trình phân tích cú pháp C]

memory_map boolean, mặc định Sai

Nếu đường dẫn tệp được cung cấp cho ______ 092, ánh xạ đối tượng tệp trực tiếp vào bộ nhớ và truy cập dữ liệu trực tiếp từ đó. Sử dụng tùy chọn này có thể cải thiện hiệu suất vì không còn bất kỳ chi phí I/O nào nữa

NA và xử lý dữ liệu bị thiếu#

na_values vô hướng, str, dạng danh sách hoặc chính tả, mặc định
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24

Các chuỗi bổ sung để nhận dạng là NA/NaN. Nếu dict được thông qua, các giá trị NA cụ thể trên mỗi cột. See na values const below for a list of the values interpreted as NaN by default.

keep_default_na boolean, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
32

Whether or not to include the default NaN values when parsing the data. Depending on whether

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
73 is passed in, the behavior is as follows

  • If

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    96 is
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    32, and
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    73 are specified,
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    73 is appended to the default NaN values used for parsing

  • If

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    96 is
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    32, and
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    73 are not specified, only the default NaN values are used for parsing

  • If

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    96 is
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    61, and
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    73 are specified, only the NaN values specified
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    73 are used for parsing

  • If

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    96 is
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    61, and
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    73 are not specified, no strings will be parsed as NaN

Note that if

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0810 is passed in as
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
61, the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
96 and
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
73 parameters will be ignored

na_filter boolean, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
32

Detect missing value markers [empty strings and the value of na_values]. In data without any NAs, passing

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0815 can improve the performance of reading a large file

verbose boolean, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
61

Indicate number of NA values placed in non-numeric columns

skip_blank_lines boolean, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
32

If

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
32, skip over blank lines rather than interpreting as NaN values

Datetime handling#

parse_dates boolean or list of ints or names or list of lists or dict, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
61.
  • If

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    32 -> try parsing the index

  • If

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    0821 -> try parsing columns 1, 2, 3 each as a separate date column

  • If

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    0822 -> combine columns 1 and 3 and parse as a single date column

  • If

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    0823 -> parse columns 1, 3 as date and call result ‘foo’

Note

A fast-path exists for iso8601-formatted dates

infer_datetime_format boolean, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
61

If

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
32 and parse_dates is enabled for a column, attempt to infer the datetime format to speed up the processing

keep_date_col boolean, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
61

If

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
32 and parse_dates specifies combining multiple columns then keep the original columns

date_parser function, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24

Function to use for converting a sequence of string columns to an array of datetime instances. The default uses

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0829 to do the conversion. pandas will try to call date_parser in three different ways, advancing to the next if an exception occurs. 1] Pass one or more arrays [as defined by parse_dates] as arguments; 2] concatenate [row-wise] the string values from the columns defined by parse_dates into a single array and pass that; and 3] call date_parser once for each row using one or more strings [corresponding to the columns defined by parse_dates] as arguments

ngày đầu tiên boolean, mặc định
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
61

Ngày định dạng DD/MM, định dạng quốc tế và châu Âu

cache_dates boolean, mặc định là True

Nếu Đúng, hãy sử dụng bộ nhớ cache của các ngày đã chuyển đổi, duy nhất để áp dụng chuyển đổi ngày giờ. Có thể tạo ra tốc độ tăng đáng kể khi phân tích chuỗi ngày trùng lặp, đặc biệt là các chuỗi có chênh lệch múi giờ

Mới trong phiên bản 0. 25. 0

Lần lặp #

trình lặp boolean, mặc định
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
61

Return

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0832 object for iteration or getting chunks with
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0833

chunksize int, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24

Return

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0832 object for iteration. See iterating and chunking below.

Quoting, compression, and file format#

compression {
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
34,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0837,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0838,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0839,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0840,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0841,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0843}, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
34

For on-the-fly decompression of on-disk data. If ‘infer’, then use gzip, bz2, zip, xz, or zstandard if

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
92 is path-like ending in ‘. gz’, ‘. bz2’, ‘. zip’, ‘. xz’, ‘. zst’, respectively, and no decompression otherwise. If using ‘zip’, the ZIP file must contain only one data file to be read in. Set to
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24 for no decompression. Can also be a dict with key
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0847 set to one of {
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0839,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0837,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0838,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0841} and other key-value pairs are forwarded to
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0852,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0853,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0854, or
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0855. As an example, the following could be passed for faster compression and to create a reproducible gzip archive.
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0856

Changed in version 1. 1. 0. dict option extended to support

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0857 and
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0858.

Changed in version 1. 2. 0. Previous versions forwarded dict entries for ‘gzip’ to

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0859.

thousands str, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24

Thousands separator

decimal str, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0861

Character to recognize as decimal point. E. g. use

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
20 for European data

float_precision string, default None

Specifies which converter the C engine should use for floating-point values. The options are

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24 for the ordinary converter,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0864 for the high-precision converter, and
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0865 for the round-trip converter

lineterminator str [length 1], default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24

Character to break file into lines. Only valid with C parser

quotechar str [length 1]

The character used to denote the start and end of a quoted item. Quoted items can include the delimiter and it will be ignored

quoting int or
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0867 instance, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
84

Control field quoting behavior per

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0867 constants. Sử dụng một trong số
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0870 [0],
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0871 [1],
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0872 [2] hoặc
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0873 [3]

doublequote boolean, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
32

When

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0875 is specified and
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0876 is not
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0873, indicate whether or not to interpret two consecutive
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0875 elements inside a field as a single
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0875 element

escapechar str [length 1], default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24

One-character string used to escape delimiter when quoting is

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0873

comment str, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24

Indicates remainder of line should not be parsed. If found at the beginning of a line, the line will be ignored altogether. This parameter must be a single character. Like empty lines [as long as

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
39], fully commented lines are ignored by the parameter
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0884 but not by
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0885. For example, if
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0886, parsing ‘#empty\na,b,c\n1,2,3’ with
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
35 will result in ‘a,b,c’ being treated as the header

encoding str, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24

Encoding to use for UTF when reading/writing [e. g.

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0889]. Danh sách mã hóa tiêu chuẩn Python

dialect str or
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0890 instance, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24

If provided, this parameter will override values [default or not] for the following parameters.

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
33,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0893,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0894,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0895,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0875, and
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0876. If it is necessary to override values, a ParserWarning will be issued. See
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0890 documentation for more details

Error handling#

error_bad_lines boolean, optional, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24

Lines with too many fields [e. g. a csv line with too many commas] will by default cause an exception to be raised, and no

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 will be returned. If
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
61, then these “bad lines” will dropped from the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 that is returned. See bad lines below.

Deprecated since version 1. 3. 0. The

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0103 parameter should be used instead to specify behavior upon encountering a bad line instead.

warn_bad_lines boolean, optional, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24

If error_bad_lines is

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
61, and warn_bad_lines is
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
32, a warning for each “bad line” will be output

Deprecated since version 1. 3. 0. The

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0103 parameter should be used instead to specify behavior upon encountering a bad line instead.

on_bad_lines [‘error’, ‘warn’, ‘skip’], default ‘error’

Specifies what to do upon encountering a bad line [a line with too many fields]. Allowed values are

  • ‘error’, raise an ParserError when a bad line is encountered

  • ‘warn’, print a warning when a bad line is encountered and skip that line

  • ‘skip’, skip bad lines without raising or warning when they are encountered

New in version 1. 3. 0

Specifying column data types#

You can indicate the data type for the whole

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 or individual columns

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

Fortunately, pandas offers more than one way to ensure that your column[s] contain only one

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
88. If you’re unfamiliar with these concepts, you can see here to learn more about dtypes, and here to learn more about
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
72 conversion in pandas.

For instance, you can use the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0111 argument of
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
13

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
08

Or you can use the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0113 function to coerce the dtypes after reading in the data,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
01

which will convert all valid parsing to floats, leaving the invalid parsing as

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
46

Ultimately, how you deal with reading in columns containing mixed dtypes depends on your specific needs. In the case above, if you wanted to

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
46 out the data anomalies, then
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0113 is probably your best option. However, if you wanted for all the data to be coerced, no matter the type, then using the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0111 argument of
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
13 would certainly be worth trying

Note

In some cases, reading in abnormal data with columns containing mixed dtypes will result in an inconsistent dataset. If you rely on pandas to infer the dtypes of your columns, the parsing engine will go and infer the dtypes for different chunks of the data, rather than the whole dataset at once. Do đó, bạn có thể kết thúc với [các] cột có các kiểu dữ liệu hỗn hợp. For example,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
20

will result with

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0119 containing an
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0120 dtype for certain chunks of the column, and
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
15 for others due to the mixed dtypes from the data that was read in. It is important to note that the overall column will be marked with a
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
88 of
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
72, which is used for columns with mixed dtypes

Specifying categorical dtype#

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0124 columns can be parsed directly by specifying
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0125 or
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0126

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
29

Individual columns can be parsed as a

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0124 using a dict specification

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
31

Specifying

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0125 will result in an unordered
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0124 whose
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0130 are the unique values observed in the data. For more control on the categories and order, create a
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0131 ahead of time, and pass that for that column’s
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
88

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
37

When using

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0133, “unexpected” values outside of
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0134 are treated as missing values

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
30

This matches the behavior of

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0135

Note

With

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0125, the resulting categories will always be parsed as strings [object dtype]. If the categories are numeric they can be converted using the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0113 function, or as appropriate, another converter such as
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0138

When

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
88 is a
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0131 with homogeneous
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0130 [ all numeric, all datetimes, etc. ], the conversion is done automatically

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
31

Naming and using columns#

Handling column names#

A file may or may not have a header row. pandas assumes the first row should be used as the column names

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
32

By specifying the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
49 argument in conjunction with
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0884 you can indicate other names to use and whether or not to throw away the header row [if any]

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
33

If the header is in a row other than the first, pass the row number to

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0884. This will skip the preceding rows

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
34

Note

Default behavior is to infer the column names. if no names are passed the behavior is identical to

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
35 and column names are inferred from the first non-blank line of the file, if column names are passed explicitly then the behavior is identical to
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
36

Duplicate names parsing#

Deprecated since version 1. 5. 0.

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0147 was never implemented, and a new argument where the renaming pattern can be specified will be added instead.

If the file or header contains duplicate names, pandas will by default distinguish between them so as to prevent overwriting data

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
35

There is no more duplicate data because

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0148 by default, which modifies a series of duplicate columns ‘X’, …, ‘X’ to become ‘X’, ‘X. 1’, …, ‘X. N’

Filtering columns [
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
47]#

The

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
47 argument allows you to select any subset of the columns in a file, either using the column names, position numbers or a callable

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
36

The

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
47 argument can also be used to specify which columns not to use in the final result

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
37

In this case, the callable is specifying that we exclude the “a” and “c” columns from the output

Comments and empty lines#

Ignoring line comments and empty lines#

If the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0152 parameter is specified, then completely commented lines will be ignored. By default, completely blank lines will be ignored as well

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
38

If

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0153, then
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
66 will not ignore blank lines

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
39

Warning

The presence of ignored lines might create ambiguities involving line numbers; the parameter

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0884 uses row numbers [ignoring commented/empty lines], while
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0885 uses line numbers [including commented/empty lines]

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
10

If both

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0884 and
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0885 are specified,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0884 will be relative to the end of
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0885. For example

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
11

Comments#

Sometimes comments or meta data may be included in a file

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
12

By default, the parser includes the comments in the output

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
13

We can suppress the comments using the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0152 keyword

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
14

Dealing with Unicode data#

The

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0162 argument should be used for encoded unicode data, which will result in byte strings being decoded to unicode in the result

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
15

Some formats which encode all characters as multiple bytes, like UTF-16, won’t parse correctly at all without specifying the encoding. Full list of Python standard encodings

Index columns and trailing delimiters#

If a file has one more column of data than the number of column names, the first column will be used as the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43’s row names

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
16

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
17

Ordinarily, you can achieve this behavior using the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0164 option

There are some exception cases when a file has been prepared with delimiters at the end of each data line, confusing the parser. To explicitly disable the index column inference and discard the last column, pass

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
44

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
18

If a subset of data is being parsed using the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
47 option, the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0164 specification is based on that subset, not the original data

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
19

Date Handling#

Specifying date columns#

To better facilitate working with datetime data,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
13 uses the keyword arguments
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0169 and
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0170 to allow users to specify a variety of columns and date/time formats to turn the input text data into
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0171 objects

The simplest case is to just pass in

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0172

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0

It is often the case that we may want to store date and time data separately, or store various date fields separately. the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0169 keyword can be used to specify a combination of columns to parse the dates and/or times from

You can specify a list of column lists to

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0169, the resulting date columns will be prepended to the output [so as to not affect the existing column order] and the new column names will be the concatenation of the component column names

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
1

By default the parser removes the component date columns, but you can choose to retain them via the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0175 keyword

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2

Note that if you wish to combine multiple columns into a single date column, a nested list must be used. In other words,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0176 indicates that the second and third columns should each be parsed as separate date columns while
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0177 means the two columns should be parsed into a single column

You can also use a dict to specify custom name columns

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3

It is important to remember that if multiple text columns are to be parsed into a single date column, then a new column is prepended to the data. The

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0164 specification is based off of this new set of columns rather than the original data columns

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
4

Note

If a column or index contains an unparsable date, the entire column or index will be returned unaltered as an object data type. For non-standard datetime parsing, use

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0138 after
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0180

Note

read_csv has a fast_path for parsing datetime strings in iso8601 format, e. g “2000-01-01T00. 01. 02+00. 00” và các biến thể tương tự. If you can arrange for your data to store datetimes in this format, load times will be significantly faster, ~20x has been observed

Date parsing functions#

Finally, the parser allows you to specify a custom

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0170 function to take full advantage of the flexibility of the date parsing API

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
5

pandas will try to call the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0170 function in three different ways. If an exception is raised, the next one is tried

  1. In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    0170 is first called with one or more arrays as arguments, as defined using
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    0169 [e. g. ,
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    0185]

  2. If #1 fails,

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    0170 is called with all the columns concatenated row-wise into a single array [e. g. ,
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    0187]

Note that performance-wise, you should try these methods of parsing dates in order

  1. Try to infer the format using

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    0188 [see section below]

  2. If you know the format, use

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    0189.
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    0190

  3. If you have a really non-standard format, use a custom

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    0170 function. For optimal performance, this should be vectorized, i. e. , it should accept arrays as arguments

Parsing a CSV with mixed timezones#

pandas cannot natively represent a column or index with mixed timezones. If your CSV file contains columns with a mixture of timezones, the default result will be an object-dtype column with strings, even with

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0169

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
6

To parse the mixed-timezone values as a datetime column, pass a partially-applied

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0138 with
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0194 as the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0170

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7

Inferring datetime format#

If you have

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0169 enabled for some or all of your columns, and your datetime strings are all formatted the same way, you may get a large speed up by setting
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0188. If set, pandas will attempt to guess the format of your datetime strings, and then use a faster means of parsing the strings. 5-10x parsing speeds have been observed. pandas will fallback to the usual parsing if either the format cannot be guessed or the format that was guessed cannot properly parse the entire column of strings. So in general,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0198 should not have any negative consequences if enabled

Here are some examples of datetime strings that can be guessed [All representing December 30th, 2011 at 00. 00. 00]

  • “20111230”

  • “2011/12/30”

  • “20111230 00. 00. 00”

  • “12/30/2011 00. 00. 00”

  • “30/Dec/2011 00. 00. 00”

  • “30/December/2011 00. 00. 00”

Note that

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0198 is sensitive to
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2000. With
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2001, it will guess “01/12/2011” to be December 1st. With
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2002 [default] it will guess “01/12/2011” to be January 12th

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
8

International date formats#

While US date formats tend to be MM/DD/YYYY, many international formats use DD/MM/YYYY instead. For convenience, a

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2000 keyword is provided

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
9

Ghi CSV vào đối tượng tệp nhị phân#

New in version 1. 2. 0

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2004 allows writing a CSV to a file object opened binary mode. In most cases, it is not necessary to specify
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2005 as Pandas will auto-detect whether the file object is opened in text or binary mode

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
080

Specifying method for floating-point conversion#

The parameter

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2006 can be specified in order to use a specific floating-point converter during parsing with the C engine. The options are the ordinary converter, the high-precision converter, and the round-trip converter [which is guaranteed to round-trip values after writing to a file]. For example

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
081

Thousand separators#

For large numbers that have been written with a thousands separator, you can set the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2007 keyword to a string of length 1 so that integers will be parsed correctly

By default, numbers with a thousands separator will be parsed as strings

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
082

The

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2007 keyword allows integers to be parsed correctly

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
083

NA values#

To control which values are parsed as missing values [which are signified by

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
46], specify a string in
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
73. If you specify a list of strings, then all values in it are considered to be missing values. If you specify a number [a
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2011, like
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2012 or an
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2013 like
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2014], the corresponding equivalent values will also imply a missing value [in this case effectively
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2015 are recognized as
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
46]

To completely override the default values that are recognized as missing, specify

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2017

The default

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
46 recognized values are
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2019

Let us consider some examples

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
084

Trong ví dụ trên,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2014 và
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2012 sẽ được công nhận là
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
46, ngoài các giá trị mặc định. A string will first be interpreted as a numerical
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2014, then as a
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
46

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
085

Above, only an empty field will be recognized as

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
46

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
086

Above, both

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2026 and
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
84 as strings are
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
46

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
087

The default values, in addition to the string

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2029 are recognized as
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
46

Infinity#

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2031 like values will be parsed as
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2032 [positive infinity], and
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2033 as
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2034 [negative infinity]. These will ignore the case of the value, meaning
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2035, will also be parsed as
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2032

Returning Series#

Using the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2037 keyword, the parser will return output with a single column as a
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
62

Deprecated since version 1. 4. 0. Users should append

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
63 to the DataFrame returned by
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
66 instead.

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
088

Boolean values#

The common values

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
32,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
61,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2043, and
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2044 are all recognized as boolean. Occasionally you might want to recognize other values as being boolean. To do this, use the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2045 and
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2046 options as follows

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
089

Handling “bad” lines#

Some files may have malformed lines with too few fields or too many. Lines with too few fields will have NA values filled in the trailing fields. Lines with too many fields will raise an error by default

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
010

You can elect to skip bad lines

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
011

Or pass a callable function to handle the bad line if

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2047. The bad line will be a list of strings that was split by the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2048

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
012

You can also use the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
47 parameter to eliminate extraneous column data that appear in some lines but not others

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
013

In case you want to keep all data including the lines with too many fields, you can specify a sufficient number of

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
49. This ensures that lines with not enough fields are filled with
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
46

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
014

Dialect#

The

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2052 keyword gives greater flexibility in specifying the file format. By default it uses the Excel dialect but you can specify either the dialect name or a
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0890 instance

Suppose you had data with unenclosed quotes

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
015

By default,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
66 uses the Excel dialect and treats the double quote as the quote character, which causes it to fail when it finds a newline before it finds the closing double quote

We can get around this using

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2052

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
016

All of the dialect options can be specified separately by keyword arguments

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
017

Another common dialect option is

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0895, to skip any whitespace after a delimiter

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
018

The parsers make every attempt to “do the right thing” and not be fragile. Type inference is a pretty big deal. If a column can be coerced to integer dtype without altering the contents, the parser will do so. Mọi cột không phải là số sẽ xuất hiện dưới dạng đối tượng dtype như với các đối tượng pandas còn lại

Quoting and Escape Characters#

Quotes [and other escape characters] in embedded fields can be handled in any number of ways. One way is to use backslashes; to properly parse this data, you should pass the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0894 option

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
019

Files with fixed width columns#

While

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
13 reads delimited data, the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2059 function works with data files that have known and fixed column widths. The function parameters to
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2060 are largely the same as
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
66 with two extra parameters, and a different usage of the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
33 parameter

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2063. A list of pairs [tuples] giving the extents of the fixed-width fields of each line as half-open intervals [i. e. , [from, to[ ]. String value ‘infer’ can be used to instruct the parser to try detecting the column specifications from the first 100 rows of the data. Default behavior, if not specified, is to infer

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2064. A list of field widths which can be used instead of ‘colspecs’ if the intervals are contiguous

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    33. Characters to consider as filler characters in the fixed-width file. Can be used to specify the filler character of the fields if it is not spaces [e. g. , ‘~’]

Xem xét một tệp dữ liệu có chiều rộng cố định điển hình

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
200

In order to parse this file into a

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43, we simply need to supply the column specifications to the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2060 function along with the file name

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
201

Note how the parser automatically picks column names X. when

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
36 argument is specified. Alternatively, you can supply just the column widths for contiguous columns:

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
202

The parser will take care of extra white spaces around the columns so it’s ok to have extra separation between the columns in the file

By default,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2060 will try to infer the file’s
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2063 by using the first 100 rows of the file. Nó chỉ có thể làm điều đó trong trường hợp khi các cột được căn chỉnh và phân tách chính xác bằng
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
33 được cung cấp [dấu phân cách mặc định là khoảng trắng]

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
203

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2060 supports the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
88 parameter for specifying the types of parsed columns to be different from the inferred type

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
204

Indexes#

Files with an “implicit” index column#

Consider a file with one less entry in the header than the number of data column

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
205

In this special case,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
66 assumes that the first column is to be used as the index of the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
206

Note that the dates weren’t automatically parsed. In that case you would need to do as before

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
207

Reading an index with a
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2076#

Suppose you have data indexed by two columns

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
208

The

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0164 argument to
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
66 can take a list of column numbers to turn multiple columns into a
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2076 for the index of the returned object

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
209

Reading columns with a
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2076#

By specifying list of row locations for the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0884 argument, you can read in a
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2076 for the columns. Specifying non-consecutive rows will skip the intervening rows

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
290

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
66 is also able to interpret a more common format of multi-columns indices

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
291

Note

If an

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0164 is not specified [e. g. you don’t have an index, or wrote it with
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2085, then any
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
49 on the columns index will be lost

Tự động “đánh hơi” dấu phân cách#

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
66 is capable of inferring delimited [not necessarily comma-separated] files, as pandas uses the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
25 class of the csv module. For this, you have to specify
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2089

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
292

Reading multiple files to create a single DataFrame#

It’s best to use

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2090 to combine multiple files. See the cookbook for an example.

Iterating through files chunk by chunk#

Suppose you wish to iterate through a [potentially very large] file lazily rather than reading the entire file into memory, such as the following

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
293

By specifying a

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
90 to
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
66, the return value will be an iterable object of type
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0832

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
294

Changed in version 1. 2.

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2094 return a context-manager when iterating through a file.

Specifying

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2095 will also return the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0832 object

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
295

Specifying the parser engine#

Pandas currently supports three engines, the C engine, the python engine, and an experimental pyarrow engine [requires the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2097 package]. In general, the pyarrow engine is fastest on larger workloads and is equivalent in speed to the C engine on most other workloads. The python engine tends to be slower than the pyarrow and C engines on most workloads. However, the pyarrow engine is much less robust than the C engine, which lacks a few features compared to the Python engine

Where possible, pandas uses the C parser [specified as

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2098], but it may fall back to Python if C-unsupported options are specified

Currently, options unsupported by the C and pyarrow engines include

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2048 other than a single character [e. g. regex separators]

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2900

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2089 with
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2902

Specifying any of the above options will produce a

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2903 unless the python engine is selected explicitly using
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2904

Options that are unsupported by the pyarrow engine which are not covered by the list above include

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2006

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    90

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    0152

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2908

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2007

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2910

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2052

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2912

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2913

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    0103

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2915

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    0876

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2917

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    0111

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2919

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    91

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2000

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    0198

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2923

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    0895

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2925

Chỉ định các tùy chọn này với

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2926 sẽ tăng
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2927

Đọc/ghi tập tin từ xa#

You can pass in a URL to read or write remote files to many of pandas’ IO functions - the following example shows reading a CSV file

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
296

New in version 1. 3. 0

A custom header can be sent alongside HTTP[s] requests by passing a dictionary of header key value mappings to the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2928 keyword argument as shown below

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
297

Tất cả các URL không phải là tệp cục bộ hoặc [các] HTTP đều được xử lý bởi fsspec, nếu được cài đặt và các triển khai hệ thống tệp khác nhau của nó [bao gồm Amazon S3, Google Cloud, SSH, FTP, webHDFS…]. Một số triển khai này sẽ yêu cầu cài đặt các gói bổ sung, ví dụ: URL S3 yêu cầu thư viện s3fs

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
298

Khi xử lý các hệ thống lưu trữ từ xa, bạn có thể cần cấu hình bổ sung với các biến môi trường hoặc tệp cấu hình ở các vị trí đặc biệt. Ví dụ: để truy cập dữ liệu trong bộ chứa S3 của bạn, bạn sẽ cần xác định thông tin xác thực theo một trong một số cách được liệt kê trong tài liệu S3Fs. Điều này cũng đúng đối với một số phụ trợ lưu trữ và bạn nên theo các liên kết tại fsimpl1 để biết các triển khai được tích hợp trong

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2929 và fsimpl2 cho những phụ trợ không có trong bản phân phối chính của
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2929

Bạn cũng có thể truyền tham số trực tiếp cho trình điều khiển phụ trợ. Ví dụ: nếu bạn không có thông tin đăng nhập S3, bạn vẫn có thể truy cập dữ liệu công khai bằng cách chỉ định một kết nối ẩn danh, chẳng hạn như

New in version 1. 2. 0

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
299

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2929 also allows complex URLs, for accessing data in compressed archives, local caching of files, and more. Để lưu trữ cục bộ ví dụ trên, bạn sẽ sửa đổi lệnh gọi thành

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
310

trong đó chúng tôi chỉ định rằng tham số “anon” có nghĩa là dành cho phần “s3” của quá trình triển khai, không dành cho việc triển khai bộ nhớ đệm. Lưu ý rằng bộ đệm này lưu trữ vào một thư mục tạm thời chỉ trong thời lượng của phiên, nhưng bạn cũng có thể chỉ định một cửa hàng vĩnh viễn

Viết ra dữ liệu #

Viết sang định dạng CSV#

Các đối tượng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
62 và
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 có một phương thức thể hiện
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2934 cho phép lưu trữ nội dung của đối tượng dưới dạng tệp giá trị được phân tách bằng dấu phẩy. Hàm nhận một số đối số. Chỉ cái đầu tiên là bắt buộc

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2935. Đường dẫn chuỗi đến tệp để ghi hoặc đối tượng tệp. Nếu một đối tượng tệp thì nó phải được mở bằng
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2936

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2048. Dấu phân cách trường cho tệp đầu ra [mặc định là “,”]

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2938. Biểu diễn chuỗi của một giá trị bị thiếu [mặc định ‘’]

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2939. Định dạng chuỗi cho số dấu phẩy động

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2940. Các cột để viết [mặc định Không có]

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    0884. Có viết tên cột hay không [mặc định là True]

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2942. có viết tên hàng [chỉ mục] hay không [mặc định là True]

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2943. [Các] nhãn cột cho [các] cột chỉ mục nếu muốn. Nếu Không có [mặc định] và
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    0884 và
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2942 là Đúng, thì tên chỉ mục được sử dụng. [Một chuỗi nên được đưa ra nếu
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    43 sử dụng MultiIndex]

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2005. Chế độ ghi Python, mặc định 'w'

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    0162. một chuỗi đại diện cho mã hóa để sử dụng nếu nội dung không phải ASCII, đối với các phiên bản Python trước 3

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2917. Chuỗi ký tự biểu thị kết thúc dòng [mặc định
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2950]

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    0876. Đặt quy tắc trích dẫn như trong mô-đun csv [csv mặc định. QUOTE_MINIMAL]. Lưu ý rằng nếu bạn đã đặt
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2939 thì số float sẽ được chuyển đổi thành chuỗi và csv. QUOTE_NONNUMERIC sẽ coi chúng không phải là số

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    0875. Ký tự được sử dụng để trích dẫn các trường [mặc định là '”']

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    0893. Kiểm soát trích dẫn của
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    0875 trong các trường [mặc định là Đúng]

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    0894. Ký tự được sử dụng để thoát khỏi
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2048 và
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    0875 khi thích hợp [mặc định Không có]

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    90. Số hàng để viết tại một thời điểm

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2960. Định dạng chuỗi cho đối tượng ngày giờ

Viết một chuỗi định dạng #

Đối tượng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 có một phương thức thể hiện
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2962 cho phép kiểm soát biểu diễn chuỗi của đối tượng. Tất cả các đối số là tùy chọn

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2963 mặc định Không có, ví dụ đối tượng StringIO

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2940 mặc định Không có, ghi cột nào

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2965 mặc định Không có, chiều rộng tối thiểu của mỗi cột

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2938 default
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    46, representation of NA value

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2968 default None, a dictionary [by column] of functions each of which takes a single argument and returns a formatted string

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2939 default None, a function which takes a single [float] argument and returns a formatted string; to be applied to floats in the
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    43

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2971 default True, set to False for a
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    43 with a hierarchical index to print every MultiIndex key at each row

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2973 default True, will print the names of the indices

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2942 default True, will print the index [ie, row labels]

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    0884 default True, will print the column labels

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2976 default
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2977, will print column headers left- or right-justified

The

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
62 object also has a
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2962 method, but with only the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2963,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2938,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2939 arguments. There is also a
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2983 argument which, if set to
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
32, will additionally output the length of the Series

JSON#

Read and write

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2985 format files and strings

Writing JSON#

A

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
62 or
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 can be converted to a valid JSON string. Use
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2988 with optional parameters

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2935 . the pathname or buffer to write the output This can be
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    24 in which case a JSON string is returned

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2991

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    62
    • default is

      In [13]: import numpy as np
      
      In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
      
      In [15]: print[data]
      a,b,c,d
      1,2,3,4
      5,6,7,8
      9,10,11
      
      In [16]: df = pd.read_csv[StringIO[data], dtype=object]
      
      In [17]: df
      Out[17]: 
         a   b   c    d
      0  1   2   3    4
      1  5   6   7    8
      2  9  10  11  NaN
      
      In [18]: df["a"][0]
      Out[18]: '1'
      
      In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
      
      In [20]: df.dtypes
      Out[20]: 
      a      int64
      b     object
      c    float64
      d      Int64
      dtype: object
      
      2942

    • allowed values are {

      In [13]: import numpy as np
      
      In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
      
      In [15]: print[data]
      a,b,c,d
      1,2,3,4
      5,6,7,8
      9,10,11
      
      In [16]: df = pd.read_csv[StringIO[data], dtype=object]
      
      In [17]: df
      Out[17]: 
         a   b   c    d
      0  1   2   3    4
      1  5   6   7    8
      2  9  10  11  NaN
      
      In [18]: df["a"][0]
      Out[18]: '1'
      
      In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
      
      In [20]: df.dtypes
      Out[20]: 
      a      int64
      b     object
      c    float64
      d      Int64
      dtype: object
      
      2994,
      In [13]: import numpy as np
      
      In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
      
      In [15]: print[data]
      a,b,c,d
      1,2,3,4
      5,6,7,8
      9,10,11
      
      In [16]: df = pd.read_csv[StringIO[data], dtype=object]
      
      In [17]: df
      Out[17]: 
         a   b   c    d
      0  1   2   3    4
      1  5   6   7    8
      2  9  10  11  NaN
      
      In [18]: df["a"][0]
      Out[18]: '1'
      
      In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
      
      In [20]: df.dtypes
      Out[20]: 
      a      int64
      b     object
      c    float64
      d      Int64
      dtype: object
      
      2995,
      In [13]: import numpy as np
      
      In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
      
      In [15]: print[data]
      a,b,c,d
      1,2,3,4
      5,6,7,8
      9,10,11
      
      In [16]: df = pd.read_csv[StringIO[data], dtype=object]
      
      In [17]: df
      Out[17]: 
         a   b   c    d
      0  1   2   3    4
      1  5   6   7    8
      2  9  10  11  NaN
      
      In [18]: df["a"][0]
      Out[18]: '1'
      
      In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
      
      In [20]: df.dtypes
      Out[20]: 
      a      int64
      b     object
      c    float64
      d      Int64
      dtype: object
      
      2942}

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    43
    • default is

      In [13]: import numpy as np
      
      In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
      
      In [15]: print[data]
      a,b,c,d
      1,2,3,4
      5,6,7,8
      9,10,11
      
      In [16]: df = pd.read_csv[StringIO[data], dtype=object]
      
      In [17]: df
      Out[17]: 
         a   b   c    d
      0  1   2   3    4
      1  5   6   7    8
      2  9  10  11  NaN
      
      In [18]: df["a"][0]
      Out[18]: '1'
      
      In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
      
      In [20]: df.dtypes
      Out[20]: 
      a      int64
      b     object
      c    float64
      d      Int64
      dtype: object
      
      2940

    • allowed values are {

      In [13]: import numpy as np
      
      In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
      
      In [15]: print[data]
      a,b,c,d
      1,2,3,4
      5,6,7,8
      9,10,11
      
      In [16]: df = pd.read_csv[StringIO[data], dtype=object]
      
      In [17]: df
      Out[17]: 
         a   b   c    d
      0  1   2   3    4
      1  5   6   7    8
      2  9  10  11  NaN
      
      In [18]: df["a"][0]
      Out[18]: '1'
      
      In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
      
      In [20]: df.dtypes
      Out[20]: 
      a      int64
      b     object
      c    float64
      d      Int64
      dtype: object
      
      2994,
      In [13]: import numpy as np
      
      In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
      
      In [15]: print[data]
      a,b,c,d
      1,2,3,4
      5,6,7,8
      9,10,11
      
      In [16]: df = pd.read_csv[StringIO[data], dtype=object]
      
      In [17]: df
      Out[17]: 
         a   b   c    d
      0  1   2   3    4
      1  5   6   7    8
      2  9  10  11  NaN
      
      In [18]: df["a"][0]
      Out[18]: '1'
      
      In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
      
      In [20]: df.dtypes
      Out[20]: 
      a      int64
      b     object
      c    float64
      d      Int64
      dtype: object
      
      2995,
      In [13]: import numpy as np
      
      In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
      
      In [15]: print[data]
      a,b,c,d
      1,2,3,4
      5,6,7,8
      9,10,11
      
      In [16]: df = pd.read_csv[StringIO[data], dtype=object]
      
      In [17]: df
      Out[17]: 
         a   b   c    d
      0  1   2   3    4
      1  5   6   7    8
      2  9  10  11  NaN
      
      In [18]: df["a"][0]
      Out[18]: '1'
      
      In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
      
      In [20]: df.dtypes
      Out[20]: 
      a      int64
      b     object
      c    float64
      d      Int64
      dtype: object
      
      2942,
      In [13]: import numpy as np
      
      In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
      
      In [15]: print[data]
      a,b,c,d
      1,2,3,4
      5,6,7,8
      9,10,11
      
      In [16]: df = pd.read_csv[StringIO[data], dtype=object]
      
      In [17]: df
      Out[17]: 
         a   b   c    d
      0  1   2   3    4
      1  5   6   7    8
      2  9  10  11  NaN
      
      In [18]: df["a"][0]
      Out[18]: '1'
      
      In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
      
      In [20]: df.dtypes
      Out[20]: 
      a      int64
      b     object
      c    float64
      d      Int64
      dtype: object
      
      2940,
      In [13]: import numpy as np
      
      In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
      
      In [15]: print[data]
      a,b,c,d
      1,2,3,4
      5,6,7,8
      9,10,11
      
      In [16]: df = pd.read_csv[StringIO[data], dtype=object]
      
      In [17]: df
      Out[17]: 
         a   b   c    d
      0  1   2   3    4
      1  5   6   7    8
      2  9  10  11  NaN
      
      In [18]: df["a"][0]
      Out[18]: '1'
      
      In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
      
      In [20]: df.dtypes
      Out[20]: 
      a      int64
      b     object
      c    float64
      d      Int64
      dtype: object
      
      3103,
      In [13]: import numpy as np
      
      In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
      
      In [15]: print[data]
      a,b,c,d
      1,2,3,4
      5,6,7,8
      9,10,11
      
      In [16]: df = pd.read_csv[StringIO[data], dtype=object]
      
      In [17]: df
      Out[17]: 
         a   b   c    d
      0  1   2   3    4
      1  5   6   7    8
      2  9  10  11  NaN
      
      In [18]: df["a"][0]
      Out[18]: '1'
      
      In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
      
      In [20]: df.dtypes
      Out[20]: 
      a      int64
      b     object
      c    float64
      d      Int64
      dtype: object
      
      3104}

    The format of the JSON string

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2994

    dict like {index -> [index], columns -> [columns], data -> [values]}

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2995

    list like [{column -> value}, … , {column -> value}]

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2942

    dict like {index -> {column -> value}}

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2940

    dict like {column -> {index -> value}}

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3103

    just the values array

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3104

    adhering to the JSON Table Schema

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2960 . string, type of date conversion, ‘epoch’ for timestamp, ‘iso’ for ISO8601

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3112 . Số vị trí thập phân sẽ sử dụng khi mã hóa các giá trị dấu phẩy động, mặc định là 10

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3113 . force encoded string to be ASCII, default True

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3114 . The time unit to encode to, governs timestamp and ISO8601 precision. One of ‘s’, ‘ms’, ‘us’ or ‘ns’ for seconds, milliseconds, microseconds and nanoseconds respectively. Default ‘ms’

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3115 . The handler to call if an object cannot otherwise be converted to a suitable format for JSON. Takes a single argument, which is the object to convert, and returns a serializable object

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3116 . If
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2995 orient, then will write each record per line as json

Note

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
46’s,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3119’s and
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24 will be converted to
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3121 and
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0171 objects will be converted based on the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2960 and
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3114 parameters

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
311

Orient options#

There are a number of different options for the format of the resulting JSON file / string. Hãy xem xét những điều sau đây

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 và
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
62

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
312

Định hướng theo cột [mặc định cho

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43] tuần tự hóa dữ liệu dưới dạng các đối tượng JSON lồng nhau với các nhãn cột đóng vai trò là chỉ mục chính

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
313

Định hướng theo chỉ mục [mặc định cho

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
62] tương tự như định hướng theo cột nhưng nhãn chỉ mục hiện là chính

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
314

Định hướng bản ghi tuần tự hóa dữ liệu thành một mảng JSON của cột -> bản ghi giá trị, không bao gồm nhãn chỉ mục. Điều này hữu ích để chuyển dữ liệu

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 tới các thư viện vẽ sơ đồ, ví dụ như thư viện JavaScript
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3130

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
315

Định hướng giá trị là một tùy chọn cơ bản chỉ tuần tự hóa thành các mảng giá trị JSON lồng nhau, không bao gồm nhãn cột và chỉ mục

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
316

Tuần tự hóa định hướng phân tách thành một đối tượng JSON chứa các mục nhập riêng biệt cho các giá trị, chỉ mục và cột. Tên cũng được bao gồm cho

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
62

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
317

Bảng được định hướng tuần tự hóa thành Lược đồ bảng JSON, cho phép lưu giữ siêu dữ liệu bao gồm nhưng không giới hạn đối với các kiểu chữ và tên chỉ mục

Note

Bất kỳ tùy chọn định hướng nào mã hóa thành đối tượng JSON sẽ không duy trì thứ tự của nhãn chỉ mục và cột trong quá trình tuần tự hóa khứ hồi. Nếu bạn muốn duy trì thứ tự nhãn, hãy sử dụng tùy chọn

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2994 vì nó sử dụng các thùng chứa được đặt hàng

Xử lý ngày#

Viết ở định dạng ngày ISO

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
318

Viết ở định dạng ngày ISO, với micro giây

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
319

Dấu thời gian Epoch, tính bằng giây

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
370

Viết vào một tệp, với chỉ mục ngày và cột ngày

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
371

Hành vi dự phòng#

Nếu trình nối tiếp JSON không thể xử lý trực tiếp nội dung vùng chứa, nó sẽ quay trở lại theo cách sau

  • nếu dtype không được hỗ trợ [e. g.

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3133] thì
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3115, nếu được cung cấp, sẽ được gọi cho mỗi giá trị, nếu không thì một ngoại lệ sẽ được đưa ra

  • nếu một đối tượng không được hỗ trợ, nó sẽ cố gắng như sau

    • kiểm tra xem đối tượng đã xác định phương thức

      In [13]: import numpy as np
      
      In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
      
      In [15]: print[data]
      a,b,c,d
      1,2,3,4
      5,6,7,8
      9,10,11
      
      In [16]: df = pd.read_csv[StringIO[data], dtype=object]
      
      In [17]: df
      Out[17]: 
         a   b   c    d
      0  1   2   3    4
      1  5   6   7    8
      2  9  10  11  NaN
      
      In [18]: df["a"][0]
      Out[18]: '1'
      
      In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
      
      In [20]: df.dtypes
      Out[20]: 
      a      int64
      b     object
      c    float64
      d      Int64
      dtype: object
      
      3135 chưa và gọi nó. Một phương thức
      In [13]: import numpy as np
      
      In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
      
      In [15]: print[data]
      a,b,c,d
      1,2,3,4
      5,6,7,8
      9,10,11
      
      In [16]: df = pd.read_csv[StringIO[data], dtype=object]
      
      In [17]: df
      Out[17]: 
         a   b   c    d
      0  1   2   3    4
      1  5   6   7    8
      2  9  10  11  NaN
      
      In [18]: df["a"][0]
      Out[18]: '1'
      
      In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
      
      In [20]: df.dtypes
      Out[20]: 
      a      int64
      b     object
      c    float64
      d      Int64
      dtype: object
      
      3135 sẽ trả về một
      In [13]: import numpy as np
      
      In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
      
      In [15]: print[data]
      a,b,c,d
      1,2,3,4
      5,6,7,8
      9,10,11
      
      In [16]: df = pd.read_csv[StringIO[data], dtype=object]
      
      In [17]: df
      Out[17]: 
         a   b   c    d
      0  1   2   3    4
      1  5   6   7    8
      2  9  10  11  NaN
      
      In [18]: df["a"][0]
      Out[18]: '1'
      
      In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
      
      In [20]: df.dtypes
      Out[20]: 
      a      int64
      b     object
      c    float64
      d      Int64
      dtype: object
      
      0843, sau đó sẽ được tuần tự hóa JSON

    • gọi

      In [13]: import numpy as np
      
      In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
      
      In [15]: print[data]
      a,b,c,d
      1,2,3,4
      5,6,7,8
      9,10,11
      
      In [16]: df = pd.read_csv[StringIO[data], dtype=object]
      
      In [17]: df
      Out[17]: 
         a   b   c    d
      0  1   2   3    4
      1  5   6   7    8
      2  9  10  11  NaN
      
      In [18]: df["a"][0]
      Out[18]: '1'
      
      In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
      
      In [20]: df.dtypes
      Out[20]: 
      a      int64
      b     object
      c    float64
      d      Int64
      dtype: object
      
      3115 nếu được cung cấp

    • chuyển đổi đối tượng thành

      In [13]: import numpy as np
      
      In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
      
      In [15]: print[data]
      a,b,c,d
      1,2,3,4
      5,6,7,8
      9,10,11
      
      In [16]: df = pd.read_csv[StringIO[data], dtype=object]
      
      In [17]: df
      Out[17]: 
         a   b   c    d
      0  1   2   3    4
      1  5   6   7    8
      2  9  10  11  NaN
      
      In [18]: df["a"][0]
      Out[18]: '1'
      
      In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
      
      In [20]: df.dtypes
      Out[20]: 
      a      int64
      b     object
      c    float64
      d      Int64
      dtype: object
      
      0843 bằng cách duyệt qua nội dung của nó. Tuy nhiên, điều này thường sẽ thất bại với
      In [13]: import numpy as np
      
      In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
      
      In [15]: print[data]
      a,b,c,d
      1,2,3,4
      5,6,7,8
      9,10,11
      
      In [16]: df = pd.read_csv[StringIO[data], dtype=object]
      
      In [17]: df
      Out[17]: 
         a   b   c    d
      0  1   2   3    4
      1  5   6   7    8
      2  9  10  11  NaN
      
      In [18]: df["a"][0]
      Out[18]: '1'
      
      In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
      
      In [20]: df.dtypes
      Out[20]: 
      a      int64
      b     object
      c    float64
      d      Int64
      dtype: object
      
      3140 hoặc cho kết quả không mong muốn

Nói chung, cách tiếp cận tốt nhất cho các đối tượng hoặc dtypes không được hỗ trợ là cung cấp một

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3115. Ví dụ

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
372

có thể được xử lý bằng cách chỉ định một

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3115 đơn giản

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
373

Đọc JSON#

Reading a JSON string to pandas object can take a number of parameters. The parser will try to parse a

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 if
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3144 is not supplied or is
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24. To explicitly force
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
62 parsing, pass
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3147

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    92 . a VALID JSON string or file handle / StringIO. The string could be a URL. Valid URL schemes include http, ftp, S3, and file. For file URLs, a host is expected. For instance, a local file could be file . //localhost/path/to/table. json

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3144 . loại đối tượng cần khôi phục [sê-ri hoặc khung], 'khung' mặc định

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2991

    Loạt
    • default is

      In [13]: import numpy as np
      
      In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
      
      In [15]: print[data]
      a,b,c,d
      1,2,3,4
      5,6,7,8
      9,10,11
      
      In [16]: df = pd.read_csv[StringIO[data], dtype=object]
      
      In [17]: df
      Out[17]: 
         a   b   c    d
      0  1   2   3    4
      1  5   6   7    8
      2  9  10  11  NaN
      
      In [18]: df["a"][0]
      Out[18]: '1'
      
      In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
      
      In [20]: df.dtypes
      Out[20]: 
      a      int64
      b     object
      c    float64
      d      Int64
      dtype: object
      
      2942

    • allowed values are {

      In [13]: import numpy as np
      
      In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
      
      In [15]: print[data]
      a,b,c,d
      1,2,3,4
      5,6,7,8
      9,10,11
      
      In [16]: df = pd.read_csv[StringIO[data], dtype=object]
      
      In [17]: df
      Out[17]: 
         a   b   c    d
      0  1   2   3    4
      1  5   6   7    8
      2  9  10  11  NaN
      
      In [18]: df["a"][0]
      Out[18]: '1'
      
      In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
      
      In [20]: df.dtypes
      Out[20]: 
      a      int64
      b     object
      c    float64
      d      Int64
      dtype: object
      
      2994,
      In [13]: import numpy as np
      
      In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
      
      In [15]: print[data]
      a,b,c,d
      1,2,3,4
      5,6,7,8
      9,10,11
      
      In [16]: df = pd.read_csv[StringIO[data], dtype=object]
      
      In [17]: df
      Out[17]: 
         a   b   c    d
      0  1   2   3    4
      1  5   6   7    8
      2  9  10  11  NaN
      
      In [18]: df["a"][0]
      Out[18]: '1'
      
      In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
      
      In [20]: df.dtypes
      Out[20]: 
      a      int64
      b     object
      c    float64
      d      Int64
      dtype: object
      
      2995,
      In [13]: import numpy as np
      
      In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
      
      In [15]: print[data]
      a,b,c,d
      1,2,3,4
      5,6,7,8
      9,10,11
      
      In [16]: df = pd.read_csv[StringIO[data], dtype=object]
      
      In [17]: df
      Out[17]: 
         a   b   c    d
      0  1   2   3    4
      1  5   6   7    8
      2  9  10  11  NaN
      
      In [18]: df["a"][0]
      Out[18]: '1'
      
      In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
      
      In [20]: df.dtypes
      Out[20]: 
      a      int64
      b     object
      c    float64
      d      Int64
      dtype: object
      
      2942}

    DataFrame
    • default is

      In [13]: import numpy as np
      
      In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
      
      In [15]: print[data]
      a,b,c,d
      1,2,3,4
      5,6,7,8
      9,10,11
      
      In [16]: df = pd.read_csv[StringIO[data], dtype=object]
      
      In [17]: df
      Out[17]: 
         a   b   c    d
      0  1   2   3    4
      1  5   6   7    8
      2  9  10  11  NaN
      
      In [18]: df["a"][0]
      Out[18]: '1'
      
      In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
      
      In [20]: df.dtypes
      Out[20]: 
      a      int64
      b     object
      c    float64
      d      Int64
      dtype: object
      
      2940

    • allowed values are {

      In [13]: import numpy as np
      
      In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
      
      In [15]: print[data]
      a,b,c,d
      1,2,3,4
      5,6,7,8
      9,10,11
      
      In [16]: df = pd.read_csv[StringIO[data], dtype=object]
      
      In [17]: df
      Out[17]: 
         a   b   c    d
      0  1   2   3    4
      1  5   6   7    8
      2  9  10  11  NaN
      
      In [18]: df["a"][0]
      Out[18]: '1'
      
      In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
      
      In [20]: df.dtypes
      Out[20]: 
      a      int64
      b     object
      c    float64
      d      Int64
      dtype: object
      
      2994,
      In [13]: import numpy as np
      
      In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
      
      In [15]: print[data]
      a,b,c,d
      1,2,3,4
      5,6,7,8
      9,10,11
      
      In [16]: df = pd.read_csv[StringIO[data], dtype=object]
      
      In [17]: df
      Out[17]: 
         a   b   c    d
      0  1   2   3    4
      1  5   6   7    8
      2  9  10  11  NaN
      
      In [18]: df["a"][0]
      Out[18]: '1'
      
      In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
      
      In [20]: df.dtypes
      Out[20]: 
      a      int64
      b     object
      c    float64
      d      Int64
      dtype: object
      
      2995,
      In [13]: import numpy as np
      
      In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
      
      In [15]: print[data]
      a,b,c,d
      1,2,3,4
      5,6,7,8
      9,10,11
      
      In [16]: df = pd.read_csv[StringIO[data], dtype=object]
      
      In [17]: df
      Out[17]: 
         a   b   c    d
      0  1   2   3    4
      1  5   6   7    8
      2  9  10  11  NaN
      
      In [18]: df["a"][0]
      Out[18]: '1'
      
      In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
      
      In [20]: df.dtypes
      Out[20]: 
      a      int64
      b     object
      c    float64
      d      Int64
      dtype: object
      
      2942,
      In [13]: import numpy as np
      
      In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
      
      In [15]: print[data]
      a,b,c,d
      1,2,3,4
      5,6,7,8
      9,10,11
      
      In [16]: df = pd.read_csv[StringIO[data], dtype=object]
      
      In [17]: df
      Out[17]: 
         a   b   c    d
      0  1   2   3    4
      1  5   6   7    8
      2  9  10  11  NaN
      
      In [18]: df["a"][0]
      Out[18]: '1'
      
      In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
      
      In [20]: df.dtypes
      Out[20]: 
      a      int64
      b     object
      c    float64
      d      Int64
      dtype: object
      
      2940,
      In [13]: import numpy as np
      
      In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
      
      In [15]: print[data]
      a,b,c,d
      1,2,3,4
      5,6,7,8
      9,10,11
      
      In [16]: df = pd.read_csv[StringIO[data], dtype=object]
      
      In [17]: df
      Out[17]: 
         a   b   c    d
      0  1   2   3    4
      1  5   6   7    8
      2  9  10  11  NaN
      
      In [18]: df["a"][0]
      Out[18]: '1'
      
      In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
      
      In [20]: df.dtypes
      Out[20]: 
      a      int64
      b     object
      c    float64
      d      Int64
      dtype: object
      
      3103,
      In [13]: import numpy as np
      
      In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
      
      In [15]: print[data]
      a,b,c,d
      1,2,3,4
      5,6,7,8
      9,10,11
      
      In [16]: df = pd.read_csv[StringIO[data], dtype=object]
      
      In [17]: df
      Out[17]: 
         a   b   c    d
      0  1   2   3    4
      1  5   6   7    8
      2  9  10  11  NaN
      
      In [18]: df["a"][0]
      Out[18]: '1'
      
      In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
      
      In [20]: df.dtypes
      Out[20]: 
      a      int64
      b     object
      c    float64
      d      Int64
      dtype: object
      
      3104}

    The format of the JSON string

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2994

    dict like {index -> [index], columns -> [columns], data -> [values]}

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2995

    list like [{column -> value}, … , {column -> value}]

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2942

    dict like {index -> {column -> value}}

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2940

    dict like {column -> {index -> value}}

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3103

    just the values array

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3104

    adhering to the JSON Table Schema

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    88 . if True, infer dtypes, if a dict of column to dtype, then use those, if
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    61, then don’t infer dtypes at all, default is True, apply only to the data

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3170 . boolean, try to convert the axes to the proper dtypes, default is
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    32

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3172 . a list of columns to parse for dates; If
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    32, then try to parse date-like columns, default is
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    32

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3175 . boolean, default
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    32. If parsing dates, then parse the default date-like columns

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3177 . direct decoding to NumPy arrays. default is
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    61; Supports numeric data only, although labels may be non-numeric. Also note that the JSON ordering MUST be the same for each term if
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3179

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3180 . boolean, default
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    61. Set to enable usage of higher precision [strtod] function when decoding string to double values. Default [
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    61] is to use fast but less precise builtin functionality

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3114 . string, the timestamp unit to detect if converting dates. Default None. By default the timestamp precision will be detected, if this is not desired then pass one of ‘s’, ‘ms’, ‘us’ or ‘ns’ to force timestamp precision to seconds, milliseconds, microseconds or nanoseconds respectively

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3116 . reads file as one json object per line

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    0162 . The encoding to use to decode py3 bytes

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    90 . when used in combination with
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3187, return a JsonReader which reads in
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    90 lines per iteration

The parser will raise one of

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3189 if the JSON is not parseable

If a non-default

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2991 was used when encoding to JSON be sure to pass the same option here so that decoding produces sensible results, see Orient Options for an overview

Data conversion#

The default of

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3191,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3192, and
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3193 will try to parse the axes, and all of the data into appropriate types, including dates. If you need to override specific dtypes, pass a dict to
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
88.
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3170 should only be set to
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
61 if you need to preserve string-like numbers [e. g. ‘1’, ‘2’] in an axes

Note

Các giá trị số nguyên lớn có thể được chuyển đổi thành ngày tháng nếu

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3193 và dữ liệu và/hoặc nhãn cột xuất hiện 'giống như ngày tháng'. Ngưỡng chính xác phụ thuộc vào
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3114 được chỉ định. 'giống ngày' có nghĩa là nhãn cột đáp ứng một trong các tiêu chí sau

  • nó kết thúc bằng

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3199

  • nó kết thúc bằng

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3700

  • nó bắt đầu bằng

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3701

  • đó là

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3702

  • đó là

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3703

Warning

Khi đọc dữ liệu JSON, việc tự động ép buộc vào dtypes có một số điều kỳ quặc

  • một chỉ mục có thể được xây dựng lại theo thứ tự khác với thứ tự tuần tự hóa, nghĩa là thứ tự trả về không được đảm bảo giống như trước khi tuần tự hóa

  • một cột có dữ liệu

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2011 sẽ được chuyển đổi thành
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2013 nếu nó có thể được thực hiện một cách an toàn, e. g. một cột của
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3706

  • các cột bool sẽ được chuyển đổi thành

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2013 khi xây dựng lại

Do đó, có những lúc bạn có thể muốn chỉ định các kiểu dữ liệu cụ thể thông qua đối số từ khóa

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
88

Đọc từ một chuỗi JSON

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
374

Đọc từ một tập tin

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
375

Không chuyển đổi bất kỳ dữ liệu nào [nhưng vẫn chuyển đổi trục và ngày tháng]

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
376

Chỉ định dtypes để chuyển đổi

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
377

Preserve string indices

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
378

Dates written in nanoseconds need to be read back in nanoseconds

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
379

The Numpy parameter#

Note

This param has been deprecated as of version 1. 0. 0 and will raise a

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3709

This supports numeric data only. Index and columns labels may be non-numeric, e. g. strings, dates etc

If

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3179 is passed to
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3711 an attempt will be made to sniff an appropriate dtype during deserialization and to subsequently decode directly to NumPy arrays, bypassing the need for intermediate Python objects

This can provide speedups if you are deserialising a large amount of numeric data

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
300

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
301

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
302

The speedup is less noticeable for smaller datasets

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
303

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
304

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
305

Warning

Direct NumPy decoding makes a number of assumptions and may fail or produce unexpected output if these assumptions are not satisfied

  • data is numeric

  • data is uniform. The dtype is sniffed from the first value decoded. A

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2927 may be raised, or incorrect output may be produced if this condition is not satisfied

  • labels are ordered. Labels are only read from the first container, it is assumed that each subsequent row / column has been encoded in the same order. This should be satisfied if the data was encoded using

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2988 but may not be the case if the JSON is from another source

Normalization#

pandas provides a utility function to take a dict or list of dicts and normalize this semi-structured data into a flat table

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
306

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
307

The max_level parameter provides more control over which level to end normalization. With max_level=1 the following snippet normalizes until 1st nesting level of the provided dict

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
308

Line delimited json#

pandas is able to read and write line-delimited json files that are common in data processing pipelines using Hadoop or Spark

For line-delimited json files, pandas can also return an iterator which reads in

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
90 lines at a time. This can be useful for large files or to read from a stream

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
309

Table schema#

Table Schema is a spec for describing tabular datasets as a JSON object. The JSON includes information on the field names, types, and other attributes. You can use the orient

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3104 to build a JSON string with two fields,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3716 and
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
56

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
310

The

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3716 field contains the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3719 key, which itself contains a list of column name to type pairs, including the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3720 or
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2076 [see below for a list of types]. The
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3716 field also contains a
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3723 field if the [Multi]index is unique

The second field,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
56, contains the serialized data with the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2995 orient. The index is included, and any datetimes are ISO 8601 formatted, as required by the Table Schema spec

The full list of types supported are described in the Table Schema spec. This table shows the mapping from pandas types

pandas type

Table Schema type

int64

integer

float64

number

bool

boolean

datetime64[ns]

datetime

timedelta64[ns]

duration

categorical

any

object

str

A few notes on the generated table schema

  • The

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3716 object contains a
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3727 field. This contains the version of pandas’ dialect of the schema, and will be incremented with each revision

  • All dates are converted to UTC when serializing. Even timezone naive values, which are treated as UTC with an offset of 0

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    311

  • datetimes with a timezone [before serializing], include an additional field

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3728 with the time zone name [e. g.
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3729]

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    312

  • Periods are converted to timestamps before serialization, and so have the same behavior of being converted to UTC. In addition, periods will contain and additional field

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3730 with the period’s frequency, e. g.
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3731

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    313

  • Categoricals use the

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3732 type and an
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3733 constraint listing the set of possible values. Additionally, an
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3734 field is included

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    314

  • Trường

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3723, chứa một mảng nhãn, được bao gồm nếu chỉ mục là duy nhất

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    315

  • Hành vi của

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3723 giống với MultiIndexes, nhưng trong trường hợp này,
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3723 là một mảng

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    316

  • Việc đặt tên mặc định đại khái tuân theo các quy tắc này

    • Đối với sê-ri,

      In [13]: import numpy as np
      
      In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
      
      In [15]: print[data]
      a,b,c,d
      1,2,3,4
      5,6,7,8
      9,10,11
      
      In [16]: df = pd.read_csv[StringIO[data], dtype=object]
      
      In [17]: df
      Out[17]: 
         a   b   c    d
      0  1   2   3    4
      1  5   6   7    8
      2  9  10  11  NaN
      
      In [18]: df["a"][0]
      Out[18]: '1'
      
      In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
      
      In [20]: df.dtypes
      Out[20]: 
      a      int64
      b     object
      c    float64
      d      Int64
      dtype: object
      
      3738 được sử dụng. Nếu không có, thì tên là
      In [13]: import numpy as np
      
      In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
      
      In [15]: print[data]
      a,b,c,d
      1,2,3,4
      5,6,7,8
      9,10,11
      
      In [16]: df = pd.read_csv[StringIO[data], dtype=object]
      
      In [17]: df
      Out[17]: 
         a   b   c    d
      0  1   2   3    4
      1  5   6   7    8
      2  9  10  11  NaN
      
      In [18]: df["a"][0]
      Out[18]: '1'
      
      In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
      
      In [20]: df.dtypes
      Out[20]: 
      a      int64
      b     object
      c    float64
      d      Int64
      dtype: object
      
      3103

    • Đối với

      In [13]: import numpy as np
      
      In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
      
      In [15]: print[data]
      a,b,c,d
      1,2,3,4
      5,6,7,8
      9,10,11
      
      In [16]: df = pd.read_csv[StringIO[data], dtype=object]
      
      In [17]: df
      Out[17]: 
         a   b   c    d
      0  1   2   3    4
      1  5   6   7    8
      2  9  10  11  NaN
      
      In [18]: df["a"][0]
      Out[18]: '1'
      
      In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
      
      In [20]: df.dtypes
      Out[20]: 
      a      int64
      b     object
      c    float64
      d      Int64
      dtype: object
      
      3740, phiên bản chuỗi hóa của tên cột được sử dụng

    • Đối với

      In [13]: import numpy as np
      
      In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
      
      In [15]: print[data]
      a,b,c,d
      1,2,3,4
      5,6,7,8
      9,10,11
      
      In [16]: df = pd.read_csv[StringIO[data], dtype=object]
      
      In [17]: df
      Out[17]: 
         a   b   c    d
      0  1   2   3    4
      1  5   6   7    8
      2  9  10  11  NaN
      
      In [18]: df["a"][0]
      Out[18]: '1'
      
      In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
      
      In [20]: df.dtypes
      Out[20]: 
      a      int64
      b     object
      c    float64
      d      Int64
      dtype: object
      
      3720 [không phải
      In [13]: import numpy as np
      
      In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
      
      In [15]: print[data]
      a,b,c,d
      1,2,3,4
      5,6,7,8
      9,10,11
      
      In [16]: df = pd.read_csv[StringIO[data], dtype=object]
      
      In [17]: df
      Out[17]: 
         a   b   c    d
      0  1   2   3    4
      1  5   6   7    8
      2  9  10  11  NaN
      
      In [18]: df["a"][0]
      Out[18]: '1'
      
      In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
      
      In [20]: df.dtypes
      Out[20]: 
      a      int64
      b     object
      c    float64
      d      Int64
      dtype: object
      
      2076],
      In [13]: import numpy as np
      
      In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
      
      In [15]: print[data]
      a,b,c,d
      1,2,3,4
      5,6,7,8
      9,10,11
      
      In [16]: df = pd.read_csv[StringIO[data], dtype=object]
      
      In [17]: df
      Out[17]: 
         a   b   c    d
      0  1   2   3    4
      1  5   6   7    8
      2  9  10  11  NaN
      
      In [18]: df["a"][0]
      Out[18]: '1'
      
      In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
      
      In [20]: df.dtypes
      Out[20]: 
      a      int64
      b     object
      c    float64
      d      Int64
      dtype: object
      
      3743 được sử dụng, với giá trị dự phòng là
      In [13]: import numpy as np
      
      In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
      
      In [15]: print[data]
      a,b,c,d
      1,2,3,4
      5,6,7,8
      9,10,11
      
      In [16]: df = pd.read_csv[StringIO[data], dtype=object]
      
      In [17]: df
      Out[17]: 
         a   b   c    d
      0  1   2   3    4
      1  5   6   7    8
      2  9  10  11  NaN
      
      In [18]: df["a"][0]
      Out[18]: '1'
      
      In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
      
      In [20]: df.dtypes
      Out[20]: 
      a      int64
      b     object
      c    float64
      d      Int64
      dtype: object
      
      2942 nếu không có

    • Đối với

      In [13]: import numpy as np
      
      In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
      
      In [15]: print[data]
      a,b,c,d
      1,2,3,4
      5,6,7,8
      9,10,11
      
      In [16]: df = pd.read_csv[StringIO[data], dtype=object]
      
      In [17]: df
      Out[17]: 
         a   b   c    d
      0  1   2   3    4
      1  5   6   7    8
      2  9  10  11  NaN
      
      In [18]: df["a"][0]
      Out[18]: '1'
      
      In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
      
      In [20]: df.dtypes
      Out[20]: 
      a      int64
      b     object
      c    float64
      d      Int64
      dtype: object
      
      2076,
      In [13]: import numpy as np
      
      In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
      
      In [15]: print[data]
      a,b,c,d
      1,2,3,4
      5,6,7,8
      9,10,11
      
      In [16]: df = pd.read_csv[StringIO[data], dtype=object]
      
      In [17]: df
      Out[17]: 
         a   b   c    d
      0  1   2   3    4
      1  5   6   7    8
      2  9  10  11  NaN
      
      In [18]: df["a"][0]
      Out[18]: '1'
      
      In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
      
      In [20]: df.dtypes
      Out[20]: 
      a      int64
      b     object
      c    float64
      d      Int64
      dtype: object
      
      3746 được sử dụng. Nếu bất kỳ cấp độ nào không có tên, thì
      In [13]: import numpy as np
      
      In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
      
      In [15]: print[data]
      a,b,c,d
      1,2,3,4
      5,6,7,8
      9,10,11
      
      In [16]: df = pd.read_csv[StringIO[data], dtype=object]
      
      In [17]: df
      Out[17]: 
         a   b   c    d
      0  1   2   3    4
      1  5   6   7    8
      2  9  10  11  NaN
      
      In [18]: df["a"][0]
      Out[18]: '1'
      
      In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
      
      In [20]: df.dtypes
      Out[20]: 
      a      int64
      b     object
      c    float64
      d      Int64
      dtype: object
      
      3747 được sử dụng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3711 cũng chấp nhận
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3749 làm đối số. Điều này cho phép duy trì siêu dữ liệu như dtypes và tên chỉ mục theo cách có thể lặp lại

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
317

Xin lưu ý rằng chuỗi ký tự 'chỉ mục' làm tên của một

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3720 không thể lặp lại, cũng như không có bất kỳ tên nào bắt đầu bằng
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3751 trong một
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2076. Chúng được sử dụng theo mặc định trong
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3753 để chỉ ra các giá trị bị thiếu và lần đọc tiếp theo không thể phân biệt ý định

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
318

Khi sử dụng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3749 cùng với
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3755 do người dùng xác định, lược đồ được tạo sẽ chứa khóa
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3756 bổ sung trong phần tử
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3719 tương ứng. Khóa bổ sung này không phải là tiêu chuẩn nhưng kích hoạt các vòng lặp JSON cho các loại tiện ích mở rộng [e. g.
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3758]

Khóa

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3756 mang tên của tiện ích mở rộng, nếu bạn đã đăng ký đúng
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3760, gấu trúc sẽ sử dụng tên đã nói để thực hiện tra cứu sổ đăng ký và chuyển đổi lại dữ liệu được tuần tự hóa thành loại tùy chỉnh của bạn

HTML#

Đọc nội dung HTML#

Warning

Chúng tôi đặc biệt khuyến khích bạn đọc Các vấn đề về phân tích cú pháp bảng HTML bên dưới về các vấn đề xung quanh trình phân tích cú pháp BeautifulSoup4/html5lib/lxml.

Hàm

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3761 cấp cao nhất có thể chấp nhận chuỗi/tệp/URL HTML và sẽ phân tích các bảng HTML thành danh sách gấu trúc
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3740. Hãy xem xét một vài ví dụ

Note

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3763 trả về một
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3764 trong số các đối tượng
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43, ngay cả khi chỉ có một bảng duy nhất chứa trong nội dung HTML

Đọc một URL không có tùy chọn

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
319

Note

Dữ liệu từ URL trên thay đổi vào thứ Hai hàng tuần nên dữ liệu kết quả ở trên có thể hơi khác một chút

Đọc nội dung của tệp từ URL trên và chuyển nó tới

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3763 dưới dạng chuỗi

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
320

Bạn thậm chí có thể vượt qua một trường hợp của

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
11 nếu bạn mong muốn

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
321

Note

Các ví dụ sau đây không được trình đánh giá IPython chạy do thực tế là có quá nhiều chức năng truy cập mạng làm chậm quá trình xây dựng tài liệu. Nếu bạn phát hiện lỗi hoặc một ví dụ không chạy, vui lòng báo cáo lỗi đó trên trang vấn đề GitHub của gấu trúc

Đọc một URL và khớp với một bảng có chứa văn bản cụ thể

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
322

Chỉ định một hàng tiêu đề [theo mặc định, các phần tử

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3768 hoặc
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3769 nằm trong
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3770 được sử dụng để tạo chỉ mục cột, nếu nhiều hàng được chứa trong
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3770 thì MultiIndex được tạo];

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
323

Chỉ định một cột chỉ mục

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
324

Chỉ định một số hàng để bỏ qua

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
325

Chỉ định một số hàng để bỏ qua bằng cách sử dụng danh sách [

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3773 cũng hoạt động]

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
326

Chỉ định một thuộc tính HTML

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
327

Chỉ định các giá trị sẽ được chuyển đổi thành NaN

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
328

Chỉ định có giữ bộ giá trị NaN mặc định hay không

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
329

Chỉ định bộ chuyển đổi cho các cột. Điều này hữu ích cho dữ liệu văn bản số có số 0 đứng đầu. Theo mặc định, các cột là số được chuyển thành kiểu số và các số 0 ở đầu sẽ bị mất. Để tránh điều này, chúng ta có thể chuyển đổi các cột này thành chuỗi

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
330

Sử dụng một số kết hợp ở trên

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
331

Đọc ở đầu ra pandas

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3774 [với một số mất độ chính xác của dấu phẩy động]

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
332

Chương trình phụ trợ

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3775 sẽ phát sinh lỗi khi phân tích cú pháp không thành công nếu đó là trình phân tích cú pháp duy nhất bạn cung cấp. Nếu bạn chỉ có một trình phân tích cú pháp duy nhất, bạn có thể chỉ cung cấp một chuỗi, nhưng cách tốt nhất là chuyển một danh sách bằng một chuỗi nếu, ví dụ, hàm mong đợi một chuỗi các chuỗi. Bạn có thể sử dụng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
333

Hoặc bạn có thể vượt qua

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3776 mà không cần danh sách

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
334

Tuy nhiên, nếu bạn đã cài đặt bs4 và html5lib và vượt qua

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24 hoặc
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3778 thì rất có thể quá trình phân tích cú pháp sẽ thành công. Lưu ý rằng ngay sau khi phân tích cú pháp thành công, hàm sẽ trả về

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
335

Liên kết có thể được trích xuất từ ​​các ô cùng với văn bản bằng cách sử dụng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3779

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
336

Mới trong phiên bản 1. 5. 0

Ghi vào tệp HTML#

Các đối tượng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 có một phương thức thể hiện
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3774 hiển thị nội dung của
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 dưới dạng bảng HTML. Các đối số của hàm như trong phương thức
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2962 được mô tả ở trên

Note

Vì lý do ngắn gọn, không phải tất cả các tùy chọn có thể có cho

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3784 đều được hiển thị ở đây. Xem
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3785 để biết đầy đủ các tùy chọn

Note

Trong môi trường hỗ trợ hiển thị HTML như Jupyter Notebook,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3786 sẽ hiển thị HTML thô vào môi trường

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
337

Đối số

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2940 sẽ giới hạn các cột được hiển thị

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
338

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2939 sử dụng Python có thể gọi được để kiểm soát độ chính xác của các giá trị dấu phẩy động

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
339

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3789 sẽ in đậm nhãn hàng theo mặc định, nhưng bạn có thể tắt tính năng này

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
340

Đối số

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3790 cung cấp khả năng đưa ra các lớp CSS của bảng HTML kết quả. Lưu ý rằng các lớp này được thêm vào lớp
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3791 hiện có

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
341

Đối số

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3792 cung cấp khả năng thêm siêu liên kết vào các ô chứa URL

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
342

Finally, the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3793 argument allows you to control whether the “” and “&” characters escaped in the resulting HTML [by default it is
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
32]. So to get the HTML without escaped characters pass
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3795

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
343

trốn thoát

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
344

không thoát

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
345

Note

Một số trình duyệt có thể không hiển thị sự khác biệt trong kết xuất của hai bảng HTML trước đó

Phân tích cú pháp bảng HTML Gotchas#

Có một số vấn đề về phiên bản xung quanh các thư viện được sử dụng để phân tích cú pháp các bảng HTML trong chức năng pandas io cấp cao nhất

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3763

Các vấn đề với lxml

  • Lợi ích

    • lxml rất nhanh

    • lxml yêu cầu Cython cài đặt chính xác

  • nhược điểm

    • lxml không đưa ra bất kỳ đảm bảo nào về kết quả phân tích cú pháp của nó trừ khi nó được đánh dấu hợp lệ nghiêm ngặt

    • Theo những điều trên, chúng tôi đã chọn cho phép bạn, người dùng, sử dụng phần phụ trợ lxml, nhưng phần phụ trợ này sẽ sử dụng html5lib nếu lxml không thể phân tích cú pháp

    • Do đó, chúng tôi khuyên bạn nên cài đặt cả BeautifulSoup4 và html5lib để bạn vẫn nhận được kết quả hợp lệ [miễn là mọi thứ khác đều hợp lệ] ngay cả khi lxml không thành công

Sự cố với BeautifulSoup4 khi sử dụng lxml làm phụ trợ

  • Các vấn đề trên cũng tồn tại ở đây vì BeautifulSoup4 về cơ bản chỉ là một trình bao bọc xung quanh phần phụ trợ của trình phân tích cú pháp

Sự cố với BeautifulSoup4 khi sử dụng html5lib làm phụ trợ

  • Lợi ích

    • html5lib nhẹ nhàng hơn nhiều so với lxml và do đó xử lý đánh dấu trong đời thực theo cách lành mạnh hơn nhiều thay vì chỉ, e. g. , loại bỏ một phần tử mà không thông báo cho bạn

    • html5lib tự động tạo đánh dấu HTML5 hợp lệ từ đánh dấu không hợp lệ. Điều này cực kỳ quan trọng để phân tích cú pháp các bảng HTML, vì nó đảm bảo một tài liệu hợp lệ. Tuy nhiên, điều đó KHÔNG có nghĩa là nó “đúng”, vì quá trình sửa lỗi đánh dấu không có một định nghĩa duy nhất

    • html5lib là Python thuần túy và không yêu cầu các bước xây dựng bổ sung ngoài cài đặt của chính nó

  • nhược điểm

    • Hạn chế lớn nhất khi sử dụng html5lib là nó chậm như mật mía. Tuy nhiên, hãy xem xét thực tế là nhiều bảng trên web không đủ lớn để thời gian chạy thuật toán phân tích cú pháp trở nên quan trọng. Nhiều khả năng nút cổ chai sẽ nằm trong quá trình đọc văn bản thô từ URL trên web, tôi. e. , IO [đầu vào-đầu ra]. Đối với các bảng rất lớn, điều này có thể không đúng

Mủ cao su#

New in version 1. 3. 0

Hiện tại không có phương thức đọc từ LaTeX, chỉ có phương thức xuất

Ghi vào tệp LaTeX#

Note

Các đối tượng DataFrame và Styler hiện có phương thức

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3797. Chúng tôi khuyên bạn nên sử dụng Styler. phương thức to_latex[] trên DataFrame. to_latex[] do tính linh hoạt cao hơn của cái trước với kiểu dáng có điều kiện và khả năng không dùng nữa trong tương lai của cái sau.

Xem lại tài liệu về Styler. to_latex , cung cấp các ví dụ về kiểu dáng có điều kiện và giải thích hoạt động của các đối số từ khóa của nó.

Đối với ứng dụng đơn giản, mẫu sau là đủ

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
346

Để định dạng giá trị trước khi xuất, hãy xâu chuỗi Styler. định dạng phương thức.

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
347

XML#

Đọc XML#

New in version 1. 3. 0

Hàm

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3798 cấp cao nhất có thể chấp nhận một chuỗi/tệp/URL XML và sẽ phân tích cú pháp các nút và thuộc tính thành một con gấu trúc
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43

Note

Since there is no standard XML structure where design types can vary in many ways,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3000 works best with flatter, shallow versions. Nếu một tài liệu XML được lồng sâu, hãy sử dụng tính năng
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3001 để chuyển đổi XML thành một phiên bản phẳng hơn

Hãy xem xét một vài ví dụ

Đọc một chuỗi XML

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
348

Đọc một URL không có tùy chọn

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
349

Đọc trong nội dung của “sách. xml” và chuyển nó tới

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3000 dưới dạng một chuỗi

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
350

Đọc trong nội dung của “sách. xml” như ví dụ của

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
11 hoặc
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3004 và chuyển nó tới
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3000

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
351

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
352

Even read XML from AWS S3 buckets such as NIH NCBI PMC Article Datasets providing Biomedical and Life Science Jorurnals

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
353

With lxml as default

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3006, you access the full-featured XML library that extends Python’s ElementTree API. Một công cụ mạnh mẽ là khả năng truy vấn các nút một cách có chọn lọc hoặc có điều kiện với XPath biểu cảm hơn

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
354

Chỉ định các phần tử hoặc chỉ các thuộc tính để phân tích cú pháp

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
355

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
356

Tài liệu XML có thể có không gian tên có tiền tố và không gian tên mặc định không có tiền tố, cả hai đều được biểu thị bằng một thuộc tính đặc biệt

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3007. Để phân tích cú pháp theo nút trong ngữ cảnh không gian tên,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3008 phải tham chiếu tiền tố

Ví dụ: XML bên dưới chứa một không gian tên có tiền tố,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3009 và URI tại
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3010. Để phân tích cú pháp các nút
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3011, phải sử dụng
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3012

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
357

Tương tự, một tài liệu XML có thể có một không gian tên mặc định không có tiền tố. Không gán tiền tố tạm thời sẽ không trả về nút nào và tăng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2927. Nhưng việc gán bất kỳ tên tạm thời nào để sửa URI cho phép phân tích cú pháp theo các nút

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
358

Tuy nhiên, nếu XPath không tham chiếu đến các tên nút như mặc định,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3014, thì không cần dùng đến
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3012

Với lxml làm trình phân tích cú pháp, bạn có thể làm phẳng các tài liệu XML lồng nhau bằng tập lệnh XSLT, tập lệnh này cũng có thể là các loại chuỗi/tệp/URL. Về cơ bản, XSLT là một ngôn ngữ có mục đích đặc biệt được viết trong một tệp XML đặc biệt có thể chuyển đổi các tài liệu XML gốc thành XML, HTML khác, thậm chí cả văn bản [CSV, JSON, v.v. ] sử dụng bộ xử lý XSLT

Ví dụ: hãy xem xét cấu trúc hơi lồng nhau này của Chicago “L” Rides trong đó các phần tử nhà ga và chuyến đi gói gọn dữ liệu trong các phần riêng của chúng. With below XSLT,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3775 can transform original nested document into a flatter output [as shown below for demonstration] for easier parse into
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
359

Đối với các tệp XML rất lớn có thể từ hàng trăm megabyte đến gigabyte,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3018 hỗ trợ phân tích cú pháp các tệp có kích thước lớn như vậy bằng cách sử dụng iterparse của lxml và iterparse của etree, đây là các phương pháp hiệu quả về bộ nhớ để lặp qua cây XML và trích xuất các phần tử và thuộc tính cụ thể. without holding entire tree in memory

Mới trong phiên bản 1. 5. 0

Để sử dụng tính năng này, bạn phải chuyển đường dẫn tệp XML vật lý vào

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3000 và sử dụng đối số
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3020. Các tệp không được nén hoặc trỏ đến các nguồn trực tuyến mà được lưu trữ trên đĩa cục bộ. Also,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3020 should be a dictionary where the key is the repeating nodes in document [which become the rows] and the value is a list of any element or attribute that is a descendant [i. e. , con, cháu] của nút lặp. Vì XPath không được sử dụng trong phương pháp này, nên các hậu duệ không cần chia sẻ cùng mối quan hệ với nhau. Dưới đây cho thấy ví dụ về việc đọc trong kết xuất dữ liệu bài viết mới nhất rất lớn [12 GB+] của Wikipedia

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
360

Viết XML#

New in version 1. 3. 0

Các đối tượng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 có một phương thức thể hiện
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3023 hiển thị nội dung của
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 dưới dạng tài liệu XML

Note

This method does not support special properties of XML including DTD, CData, XSD schemas, processing instructions, comments, and others. Chỉ các không gian tên ở cấp cơ sở được hỗ trợ. Tuy nhiên,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3001 cho phép thay đổi thiết kế sau đầu ra ban đầu

Hãy xem xét một vài ví dụ

Viết một XML không có tùy chọn

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
361

Viết một XML với gốc và tên hàng mới

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
362

Write an attribute-centric XML

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
363

Viết hỗn hợp các phần tử và thuộc tính

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
364

Bất kỳ

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3740 nào có các cột phân cấp sẽ được làm phẳng cho các tên phần tử XML với các mức được phân tách bằng dấu gạch dưới

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
365

Viết một XML với không gian tên mặc định

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
366

Viết một XML với tiền tố không gian tên

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
367

Viết một XML mà không cần khai báo hoặc in đẹp

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
368

Viết một XML và chuyển đổi với biểu định kiểu

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
369

XML Final Notes#

  • Tất cả các tài liệu XML tuân thủ các thông số kỹ thuật của W3C. Both

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3027 and
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3775 parsers will fail to parse any markup document that is not well-formed or follows XML syntax rules. Do be aware HTML is not an XML document unless it follows XHTML specs. However, other popular markup types including KML, XAML, RSS, MusicML, MathML are compliant XML schemas

  • For above reason, if your application builds XML prior to pandas operations, use appropriate DOM libraries like

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3027 and
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3775 to build the necessary document and not by string concatenation or regex adjustments. Always remember XML is a special text file with markup rules

  • With very large XML files [several hundred MBs to GBs], XPath and XSLT can become memory-intensive operations. Be sure to have enough available RAM for reading and writing to large XML files [roughly about 5 times the size of text]

  • Because XSLT is a programming language, use it with caution since such scripts can pose a security risk in your environment and can run large or infinite recursive operations. Luôn kiểm tra tập lệnh trên các đoạn nhỏ trước khi chạy đầy đủ

  • The etree parser supports all functionality of both

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3000 and
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3023 except for complex XPath and any XSLT. Though limited in features,
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3027 is still a reliable and capable parser and tree builder. Its performance may trail
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3775 to a certain degree for larger files but relatively unnoticeable on small to medium size files

Excel files#

The

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3035 method can read Excel 2007+ [
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3036] files using the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3037 Python module. Excel 2003 [
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3038] files can be read using
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3039. Binary Excel [
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3040] files can be read using
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3041. The
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3042 instance method is used for saving a
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 to Excel. Generally the semantics are similar to working with csv data. See the cookbook for some advanced strategies.

Warning

The xlwt package for writing old-style

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3038 excel files is no longer maintained. The xlrd package is now only for reading old-style
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3038 files

Before pandas 1. 3. 0, đối số mặc định

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3046 đến
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3035 sẽ dẫn đến việc sử dụng công cụ
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3039 trong nhiều trường hợp, bao gồm các tệp Excel 2007+ [
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3036] mới. gấu trúc bây giờ sẽ mặc định sử dụng công cụ openpyxl

It is strongly encouraged to install

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3037 to read Excel 2007+ [
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3036] files. Vui lòng không báo cáo sự cố khi sử dụng ``xlrd`` để đọc ``. tập tin xlsx``. This is no longer supported, switch to using
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3037 instead

Attempting to use the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3053 engine will raise a
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3709 unless the option
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3055 is set to
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3056. Mặc dù tùy chọn này hiện không được dùng nữa và cũng sẽ tăng
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3709, tùy chọn này có thể được đặt trên toàn cầu và cảnh báo bị chặn. Users are recommended to write
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3036 files using the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3037 engine instead

Đọc tệp Excel#

Trong trường hợp sử dụng cơ bản nhất,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3060 có đường dẫn đến tệp Excel và
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3061 cho biết trang tính nào cần phân tích cú pháp

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
370

lớp
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3062#

Để tạo điều kiện làm việc với nhiều trang tính từ cùng một tệp, lớp

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3062 có thể được sử dụng để bọc tệp và có thể được chuyển vào
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3060 Sẽ có lợi về hiệu suất khi đọc nhiều trang tính vì tệp chỉ được đọc vào bộ nhớ một lần

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
371

Lớp

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3062 cũng có thể được sử dụng làm trình quản lý ngữ cảnh

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
372

Thuộc tính

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3066 sẽ tạo danh sách tên trang tính trong tệp

Trường hợp sử dụng chính cho

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3062 đang phân tích cú pháp nhiều trang tính với các tham số khác nhau

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
373

Lưu ý rằng nếu các tham số phân tích cú pháp giống nhau được sử dụng cho tất cả các trang tính, một danh sách tên trang tính có thể được chuyển đến

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3060 mà không làm giảm hiệu suất

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
374

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3062 cũng có thể được gọi với đối tượng
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3070 làm tham số. This allows the user to control how the excel file is read. Ví dụ: các trang tính có thể được tải theo yêu cầu bằng cách gọi
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3071 với
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3072

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
375

Chỉ định trang tính #

Note

Đối số thứ hai là

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3061, đừng nhầm lẫn với
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3074

Note

Thuộc tính của ExcelFile

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3066 cung cấp quyền truy cập vào danh sách các trang tính

  • Các đối số

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3061 cho phép chỉ định trang tính hoặc trang tính để đọc

  • Giá trị mặc định cho

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3061 là 0, cho biết đọc trang đầu tiên

  • Truyền một chuỗi để chỉ tên của một trang tính cụ thể trong sổ làm việc

  • Truyền một số nguyên để chỉ chỉ mục của một trang tính. Các chỉ số tuân theo quy ước Python, bắt đầu từ 0

  • Truyền một danh sách các chuỗi hoặc số nguyên để trả về một từ điển gồm các trang tính được chỉ định

  • Vượt qua một

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    24 để trả lại một từ điển của tất cả các tờ có sẵn

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
376

Sử dụng chỉ mục trang tính

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
377

Sử dụng tất cả các giá trị mặc định

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
378

Sử dụng Không để có được tất cả các tờ

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
379

Using a list to get multiple sheets

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
380

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3060 can read more than one sheet, by setting
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3061 to either a list of sheet names, a list of sheet positions, or
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24 to read all sheets. Các trang tính có thể được chỉ định theo chỉ mục trang tính hoặc tên trang tính, sử dụng một số nguyên hoặc chuỗi tương ứng

Đọc một
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2076#

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3060 có thể đọc chỉ mục
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2076 bằng cách chuyển danh sách các cột tới
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0164 và cột
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2076 bằng cách chuyển danh sách các hàng tới
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0884. Nếu
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2942 hoặc
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2940 có tên cấp độ được đánh số thứ tự, những tên đó cũng sẽ được đọc bằng cách chỉ định các hàng/cột tạo nên cấp độ

Ví dụ: để đọc trong chỉ mục

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2076 không có tên

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
381

Nếu chỉ mục có tên cấp độ, chúng cũng sẽ được phân tích cú pháp, sử dụng cùng tham số

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
382

If the source file has both

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2076 index and columns, lists specifying each should be passed to
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0164 and
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0884

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
383

Các giá trị bị thiếu trong các cột được chỉ định trong

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0164 sẽ được điền chuyển tiếp để cho phép thực hiện quay vòng với
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3095 cho
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3096. Để tránh điền tiếp các giá trị còn thiếu, hãy sử dụng
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3097 sau khi đọc dữ liệu thay vì
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0164

Phân tích cú pháp các cột cụ thể#

Thường xảy ra trường hợp người dùng sẽ chèn các cột để thực hiện các phép tính tạm thời trong Excel và bạn có thể không muốn đọc trong các cột đó.

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3060 lấy từ khóa
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
47 để cho phép bạn chỉ định một tập hợp con các cột để phân tích cú pháp

Thay đổi trong phiên bản 1. 0. 0

Passing in an integer for

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
47 will no longer work. Thay vào đó, vui lòng chuyển vào danh sách các số nguyên từ 0 đến
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
47

Bạn có thể chỉ định một tập hợp các cột và phạm vi Excel được phân tách bằng dấu phẩy dưới dạng một chuỗi

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
384

Nếu

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
47 là một danh sách các số nguyên, thì nó được coi là chỉ số cột tệp được phân tích cú pháp

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
385

Element order is ignored, so

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
54 is the same as
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
55

Nếu

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
47 là một danh sách các chuỗi, giả định rằng mỗi chuỗi tương ứng với một tên cột do người dùng cung cấp trong
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
49 hoặc được suy ra từ [các] hàng tiêu đề tài liệu. Các chuỗi đó xác định cột nào sẽ được phân tích cú pháp

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
386

Thứ tự phần tử bị bỏ qua, vì vậy

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3108 giống như
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3109

Nếu

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
47 có thể gọi được, thì hàm có thể gọi được sẽ được đánh giá dựa trên tên cột, trả về các tên mà hàm có thể gọi được đánh giá là
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
32

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
387

Ngày phân tích cú pháp#

Các giá trị giống như ngày giờ thường được tự động chuyển đổi thành dtype thích hợp khi đọc tệp excel. Nhưng nếu bạn có một cột gồm các chuỗi trông giống như ngày tháng [nhưng thực tế không được định dạng là ngày tháng trong excel], bạn có thể sử dụng từ khóa

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0169 để phân tích cú pháp các chuỗi đó thành datetimes

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
388

Bộ chuyển đổi tế bào #

Có thể chuyển đổi nội dung của các ô Excel thông qua tùy chọn

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0111. Chẳng hạn, để chuyển đổi một cột thành boolean

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
389

Tùy chọn này xử lý các giá trị bị thiếu và coi các ngoại lệ trong bộ chuyển đổi là dữ liệu bị thiếu. Transformations are applied cell by cell rather than to the column as a whole, so the array dtype is not guaranteed. Chẳng hạn, một cột gồm các số nguyên có giá trị bị thiếu không thể được chuyển đổi thành một mảng có kiểu số nguyên, vì NaN hoàn toàn là một số float. You can manually mask missing data to recover integer dtype

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
390

Dtype specifications#

Là một giải pháp thay thế cho bộ chuyển đổi, loại cho toàn bộ cột có thể được chỉ định bằng cách sử dụng từ khóa

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
88, từ điển ánh xạ tên cột thành các loại. To interpret data with no type inference, use the type
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
15 or
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
72

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
391

Writing Excel files#

Writing Excel files to disk#

To write a

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 object to a sheet of an Excel file, you can use the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3095 instance method. The arguments are largely the same as
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2934 described above, the first argument being the name of the excel file, and the optional second argument the name of the sheet to which the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 should be written. For example

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
392

Files with a

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3038 extension will be written using
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3053 and those with a
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3036 extension will be written using
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3124 [if available] or
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3037

The

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 will be written in a way that tries to mimic the REPL output. The
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2943 will be placed in the second row instead of the first. You can place it in the first row by setting the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3128 option in
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3042 to
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
61

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
393

In order to write separate

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3740 to separate sheets in a single Excel file, one can pass an
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3132

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
394

Writing Excel files to memory#

pandas supports writing Excel files to buffer-like objects such as

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
11 or
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3004 using
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3132

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
395

Note

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3136 is optional but recommended. Setting the engine determines the version of workbook produced. Setting
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3137 will produce an Excel 2003-format workbook [xls]. Using either
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3138 or
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3139 will produce an Excel 2007-format workbook [xlsx]. If omitted, an Excel 2007-formatted workbook is produced

Excel writer engines#

Deprecated since version 1. 2. 0. As the xlwt package is no longer maintained, the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3053 engine will be removed from a future version of pandas. This is the only engine in pandas that supports writing to
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3038 files.

pandas chooses an Excel writer via two methods

  1. the

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3136 keyword argument

  2. phần mở rộng tên tệp [thông qua mặc định được chỉ định trong tùy chọn cấu hình]

By default, pandas uses the XlsxWriter for

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3036, openpyxl for
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3144, and xlwt for
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3038 files. If you have multiple engines installed, you can set the default engine through setting the config options
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3146 and
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3055. pandas will fall back on openpyxl for
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3036 files if Xlsxwriter is not available.

To specify which writer you want to use, you can pass an engine keyword argument to

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3095 and to
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3132. The built-in engines are

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3037. version 2. 4 or higher is required

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3124

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3053

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
396

Style and formatting#

The look and feel of Excel worksheets created from pandas can be modified using the following parameters on the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43’s
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3095 method

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2939 . Format string for floating point numbers [default
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    24]

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3158 . A tuple of two integers representing the bottommost row and rightmost column to freeze. Each of these parameters is one-based, so [1, 1] will freeze the first row and first column [default
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    24]

Using the Xlsxwriter engine provides many options for controlling the format of an Excel worksheet created with the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3095 method. Excellent examples can be found in the Xlsxwriter documentation here. https. //xlsxwriter. readthedocs. io/working_with_pandas. html

OpenDocument Spreadsheets#

New in version 0. 25

The

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3035 method can also read OpenDocument spreadsheets using the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3162 module. The semantics and features for reading OpenDocument spreadsheets match what can be done for Excel files using
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3163

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
397

Note

Currently pandas only supports reading OpenDocument spreadsheets. Writing is not implemented

Binary Excel [. xlsb] files#

New in version 1. 0. 0

The

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3035 method can also read binary Excel files using the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3041 module. The semantics and features for reading binary Excel files mostly match what can be done for Excel files using
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3166.
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3041 does not recognize datetime types in files and will return floats instead

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
398

Note

Currently pandas only supports reading binary Excel files. Writing is not implemented

Clipboard#

A handy way to grab data is to use the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3168 method, which takes the contents of the clipboard buffer and passes them to the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
66 method. For instance, you can copy the following text to the clipboard [CTRL-C on many operating systems]

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
399

And then import the data directly to a

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 by calling

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
100

The

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3171 method can be used to write the contents of a
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 to the clipboard. Following which you can paste the clipboard contents into other applications [CTRL-V on many operating systems]. Here we illustrate writing a
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 into clipboard and reading it back

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
101

We can see that we got the same content back, which we had earlier written to the clipboard

Note

You may need to install xclip or xsel [with PyQt5, PyQt4 or qtpy] on Linux to use these methods

Pickling#

All pandas objects are equipped with

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3174 methods which use Python’s
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3175 module to save data structures to disk using the pickle format

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
102

The

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3176 function in the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3177 namespace can be used to load any pickled pandas object [or any other pickled object] from file

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
103

Warning

Loading pickled data received from untrusted sources can be unsafe

See. https. //docs. python. org/3/library/pickle. html

Warning

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3178 is only guaranteed backwards compatible back to pandas version 0. 20. 3

Compressed pickle files#

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3178,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3180 and
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3181 can read and write compressed pickle files. The compression types of
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0857,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0858,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3184,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3185 are supported for reading and writing. The
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3186 file format only supports reading and must contain only one data file to be read

The compression type can be an explicit parameter or be inferred from the file extension. If ‘infer’, then use

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0857,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0858,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3186,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3184,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3185 if filename ends in
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3192,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3193,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3194,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3195, or
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3196, respectively

The compression parameter can also be a

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0843 in order to pass options to the compression protocol. Nó phải có khóa
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0847 được đặt thành tên của giao thức nén, phải là một trong {
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0839,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0837,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0838,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0840,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0841}. All other key-value pairs are passed to the underlying compression library

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
104

Using an explicit compression type

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
105

Inferring compression type from the extension

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
106

The default is to ‘infer’

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
107

Passing options to the compression protocol in order to speed up compression

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
108

msgpack#

pandas support for

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3204 has been removed in version 1. 0. 0. It is recommended to use pickle instead.

Alternatively, you can also the Arrow IPC serialization format for on-the-wire transmission of pandas objects. For documentation on pyarrow, see here

HDF5 [PyTables]#

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3205 is a dict-like object which reads and writes pandas using the high performance HDF5 format using the excellent PyTables library. See the cookbook for some advanced strategies

Warning

pandas uses PyTables for reading and writing HDF5 files, which allows serializing object-dtype data with pickle. Loading pickled data received from untrusted sources can be unsafe

See. https. //docs. python. org/3/library/pickle. html for more

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
109

Objects can be written to the file just like adding key-value pairs to a dict

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
110

In a current or later Python session, you can retrieve stored objects

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
111

Deletion of the object specified by the key

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
112

Closing a Store and using a context manager

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
113

Read/write API#

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3205 supports a top-level API using
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3207 for reading and
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3208 for writing, similar to how
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
66 and
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2934 work

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
114

HDFStore will by default not drop rows that are all missing. This behavior can be changed by setting

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3211

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
115

Fixed format#

The examples above show storing using

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3212, which write the HDF5 to
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3213 in a fixed array format, called the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3214 format. These types of stores are not appendable once written [though you can simply remove them and rewrite]. Nor are they queryable; they must be retrieved in their entirety. They also do not support dataframes with non-unique column names. The
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3214 format stores offer very fast writing and slightly faster reading than
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3104 stores. This format is specified by default when using
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3212 or
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3208 or by
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3219 or
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3220

Warning

A

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3214 format will raise a
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3222 if you try to retrieve using a
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3223

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
116

Table format#

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3205 supports another
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3213 format on disk, the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3104 format. Về mặt khái niệm, một
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3104 có hình dạng rất giống một DataFrame, với các hàng và cột. A
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3104 may be appended to in the same or other sessions. In addition, delete and query type operations are supported. This format is specified by
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3229 or
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3230 to
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3231 or
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3212 or
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3208

This format can be set as an option as well

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3234 to enable
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3235 to by default store in the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3104 format

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
117

Note

You can also create a

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3104 by passing
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3229 or
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3230 to a
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3212 operation

Hierarchical keys#

Keys to a store can be specified as a string. These can be in a hierarchical path-name like format [e. g.

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3241], which will generate a hierarchy of sub-stores [or
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3242 in PyTables parlance]. Keys can be specified without the leading ‘/’ and are always absolute [e. g. ‘foo’ refers to ‘/foo’]. Thao tác xóa có thể xóa mọi thứ trong cửa hàng phụ trở xuống, vì vậy hãy cẩn thận

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
118

Bạn có thể duyệt qua hệ thống phân cấp nhóm bằng phương pháp

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3243 sẽ tạo ra một bộ cho mỗi khóa nhóm cùng với các khóa tương đối của nội dung của nó

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
119

Warning

Hierarchical keys cannot be retrieved as dotted [attribute] access as described above for items stored under the root node

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
120

Instead, use explicit string based keys

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
121

Storing types#

Lưu trữ các loại hỗn hợp trong một bảng#

Storing mixed-dtype data is supported. Strings are stored as a fixed-width using the maximum size of the appended column. Subsequent attempts at appending longer strings will raise a

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2927

Passing

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3245 as a parameter to append will set a larger minimum for the string columns. Storing
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3246 are currently supported. For string columns, passing
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3247 to append will change the default nan representation on disk [which converts to/from
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3248], this defaults to
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3249

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
122

Storing MultiIndex DataFrames#

Storing MultiIndex

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3740 as tables is very similar to storing/selecting from homogeneous index
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3740

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
123

Note

The

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2942 keyword is reserved and cannot be use as a level name

Querying#

Querying a table#

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3253 and
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3254 operations have an optional criterion that can be specified to select/delete only a subset of the data. This allows one to have a very large on-disk table and retrieve only a portion of the data

A query is specified using the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3255 class under the hood, as a boolean expression

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2942 and
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2940 are supported indexers of
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3740

  • if

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3259 are specified, these can be used as additional indexers

  • level name in a MultiIndex, with default name

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3260,
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3261, … if not provided

Valid comparison operators are

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3262

Valid boolean expressions are combined with

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3263 . or

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3264 . and

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3265 and
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3266 . để nhóm

These rules are similar to how boolean expressions are used in pandas for indexing

Note

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3267 will be automatically expanded to the comparison operator
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3268

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3269 is the not operator, but can only be used in very limited circumstances

  • If a list/tuple of expressions is passed they will be combined via

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3264

The following are valid expressions

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3271

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3272

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3273

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3274

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3275

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3276

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3277

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3278

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3279

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3280

The

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3281 are on the left-hand side of the sub-expression

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2940,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3283,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3284

The right-hand side of the sub-expression [after a comparison operator] can be

  • functions that will be evaluated, e. g.

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3285

  • strings, e. g.

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3286

  • date-like, e. g.

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3287, or
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3288

  • lists, e. g.

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3289

  • variables that are defined in the local names space, e. g.

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3290

Note

Passing a string to a query by interpolating it into the query expression is not recommended. Simply assign the string of interest to a variable and use that variable in an expression. For example, do this

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
124

instead of this

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
125

The latter will not work and will raise a

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3291. Note that there’s a single quote followed by a double quote in the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3292 variable

If you must interpolate, use the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3293 format specifier

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
126

which will quote

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3292

Here are some examples

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
127

Use boolean expressions, with in-line function evaluation

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
128

Use inline column reference

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
129

The

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2940 keyword can be supplied to select a list of columns to be returned, this is equivalent to passing a
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3296

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
130

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3297 and
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3298 parameters can be specified to limit the total search space. These are in terms of the total number of rows in a table

Note

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3253 will raise a
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2927 if the query expression has an unknown variable reference. Usually this means that you are trying to select on a column that is not a data_column

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3253 will raise a
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3291 if the query expression is not valid

Query timedelta64[ns]#

You can store and query using the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3303 type. Terms can be specified in the format.
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3304, where float may be signed [and fractional], and unit can be
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3305 for the timedelta. Here’s an example

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
131

Query MultiIndex#

Selecting from a

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2076 can be achieved by using the name of the level

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
132

If the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2076 levels names are
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24, the levels are automatically made available via the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3309 keyword with
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3310 the level of the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2076 you want to select from

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
133

Indexing#

You can create/modify an index for a table with

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3312 after data is already in the table [after and
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3313 operation]. Creating a table index is highly encouraged. This will speed your queries a great deal when you use a
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3253 with the indexed dimension as the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3223

Note

Indexes are automagically created on the indexables and any data columns you specify. Có thể tắt hành vi này bằng cách chuyển

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3316 đến
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3231

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
134

Oftentimes when appending large amounts of data to a store, it is useful to turn off index creation for each append, then recreate at the end

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
135

Then create the index when finished appending

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
136

See here for how to create a completely-sorted-index [CSI] on an existing store

Query via data columns#

You can designate [and index] certain columns that you want to be able to perform queries [other than the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3318 columns, which you can always query]. For instance say you want to perform this common operation, on-disk, and return just the frame that matches this query. You can specify
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3319 to force all columns to be
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3259

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
137

There is some performance degradation by making lots of columns into

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3321, so it is up to the user to designate these. In addition, you cannot change data columns [nor indexables] after the first append/put operation [Of course you can simply read in the data and create a new table. ]

Iterator#

You can pass

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2095 or
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3323 to
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3253 and
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3325 to return an iterator on the results. The default is 50,000 rows returned in a chunk

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
138

Note

You can also use the iterator with

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3207 which will open, then automatically close the store when finished iterating

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
139

Note, that the chunksize keyword applies to the source rows. So if you are doing a query, then the chunksize will subdivide the total rows in the table and the query applied, returning an iterator on potentially unequal sized chunks

Here is a recipe for generating a query and using it to create equal sized return chunks

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
140

Advanced queries#

Select a single column#

To retrieve a single indexable or data column, use the method

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3327. This will, for example, enable you to get the index very quickly. These return a
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
62 of the result, indexed by the row number. These do not currently accept the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3223 selector

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
141

Selecting coordinates#

Sometimes you want to get the coordinates [a. k. a the index locations] of your query. This returns an

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3330 of the resulting locations. These coordinates can also be passed to subsequent
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3223 operations

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
142

Selecting using a where mask#

Sometime your query can involve creating a list of rows to select. Usually this

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3332 would be a resulting
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2942 from an indexing operation. This example selects the months of a datetimeindex which are 5

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
143

Storer object#

If you want to inspect the stored object, retrieve via

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3334. Bạn có thể sử dụng điều này theo lập trình để nói lấy số lượng hàng trong một đối tượng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
144

Multiple table queries#

The methods

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3335 and
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3325 can perform appending/selecting from multiple tables at once. The idea is to have one table [call it the selector table] that you index most/all of the columns, and perform your queries. The other table[s] are data tables with an index matching the selector table’s index. You can then perform a very fast query on the selector table, yet get lots of data back. This method is similar to having a very wide table, but enables more efficient queries

The

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3335 method splits a given single DataFrame into multiple tables according to
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3338, a dictionary that maps the table names to a list of ‘columns’ you want in that table. If
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24 is used in place of a list, that table will have the remaining unspecified columns of the given DataFrame. The argument
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3340 defines which table is the selector table [which you can make queries from]. The argument
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3341 will drop rows from the input
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 to ensure tables are synchronized. This means that if a row for one of the tables being written to is entirely
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3343, that row will be dropped from all tables

If

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3341 is False, THE USER IS RESPONSIBLE FOR SYNCHRONIZING THE TABLES. Remember that entirely
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3345 rows are not written to the HDFStore, so if you choose to call
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3346, some tables may have more rows than others, and therefore
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3325 may not work or it may return unexpected results

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
145

Delete from a table#

You can delete from a table selectively by specifying a

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3223. In deleting rows, it is important to understand the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3213 deletes rows by erasing the rows, then moving the following data. Do đó, việc xóa có thể là một hoạt động rất tốn kém tùy thuộc vào hướng dữ liệu của bạn. To get optimal performance, it’s worthwhile to have the dimension you are deleting be the first of the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3350

Data is ordered [on the disk] in terms of the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3350. Here’s a simple use case. You store panel-type data, with dates in the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3283 and ids in the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3353. The data is then interleaved like this

  • date_1
    • id_1

    • id_2

    • .

    • id_n

  • date_2
    • id_1

    • .

    • id_n

It should be clear that a delete operation on the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3283 will be fairly quick, as one chunk is removed, then the following data moved. On the other hand a delete operation on the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3353 will be very expensive. In this case it would almost certainly be faster to rewrite the table using a
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3223 that selects all but the missing data

Warning

Please note that HDF5 DOES NOT RECLAIM SPACE in the h5 files automatically. Thus, repeatedly deleting [or removing nodes] and adding again, WILL TEND TO INCREASE THE FILE SIZE

To repack and clean the file, use ptrepack .

Notes & caveats#

Compression#

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3213 allows the stored data to be compressed. This applies to all kinds of stores, not just tables. Two parameters are used to control compression.
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3358 and
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3359

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3358 specifies if and how hard data is to be compressed.
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3361 and
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3362 disables compression and
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3363 enables compression

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3359 specifies which compression library to use. If nothing is specified the default library
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3365 is used. A compression library usually optimizes for either good compression rates or speed and the results will depend on the type of data. Which type of compression to choose depends on your specific needs and data. The list of supported compression libraries

    • zlib. The default compression library. A classic in terms of compression, achieves good compression rates but is somewhat slow

    • lzo. Fast compression and decompression

    • bzip2. Good compression rates

    • blosc. Fast compression and decompression

      Support for alternative blosc compressors

      • blosc. blosclz This is the default compressor for

        In [13]: import numpy as np
        
        In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
        
        In [15]: print[data]
        a,b,c,d
        1,2,3,4
        5,6,7,8
        9,10,11
        
        In [16]: df = pd.read_csv[StringIO[data], dtype=object]
        
        In [17]: df
        Out[17]: 
           a   b   c    d
        0  1   2   3    4
        1  5   6   7    8
        2  9  10  11  NaN
        
        In [18]: df["a"][0]
        Out[18]: '1'
        
        In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
        
        In [20]: df.dtypes
        Out[20]: 
        a      int64
        b     object
        c    float64
        d      Int64
        dtype: object
        
        3366

      • blosc. lz4. A compact, very popular and fast compressor

      • blosc. lz4hc. A tweaked version of LZ4, produces better compression ratios at the expense of speed

      • blosc. snappy. A popular compressor used in many places

      • blosc. zlib. A classic; somewhat slower than the previous ones, but achieving better compression ratios

      • blosc. zstd. An extremely well balanced codec; it provides the best compression ratios among the others above, and at reasonably fast speed

    If

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3359 is defined as something other than the listed libraries a
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2927 exception is issued

Note

If the library specified with the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3359 option is missing on your platform, compression defaults to
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3365 without further ado

Enable compression for all objects within the file

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
146

Or on-the-fly compression [this only applies to tables] in stores where compression is not enabled

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
147

ptrepack#

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3213 offers better write performance when tables are compressed after they are written, as opposed to turning on compression at the very beginning. You can use the supplied
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3213 utility
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3373. In addition,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3373 can change compression levels after the fact

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
148

Furthermore

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3375 will repack the file to allow you to reuse previously deleted space. Ngoài ra, người ta có thể chỉ cần xóa tệp và ghi lại hoặc sử dụng phương thức
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3376

Caveats#

Warning

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3205 is not-threadsafe for writing. The underlying
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3213 only supports concurrent reads [via threading or processes]. If you need reading and writing at the same time, you need to serialize these operations in a single thread in a single process. You will corrupt your data otherwise. See the [GH2397] for more information

  • If you use locks to manage write access between multiple processes, you may want to use

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3379 before releasing write locks. For convenience you can use
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3380 to do this for you

  • Once a

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3104 is created columns [DataFrame] are fixed; only exactly the same columns can be appended

  • Be aware that timezones [e. g. ,

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3382] are not necessarily equal across timezone versions. So if data is localized to a specific timezone in the HDFStore using one version of a timezone library and that data is updated with another version, the data will be converted to UTC since these timezones are not considered equal. Either use the same version of timezone library or use
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3383 with the updated timezone definition

Warning

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3213 will show a
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3385 if a column name cannot be used as an attribute selector. Natural identifiers contain only letters, numbers, and underscores, and may not begin with a number. Other identifiers cannot be used in a
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3223 clause and are generally a bad idea

DataTypes#

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3205 will map an object dtype to the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3213 underlying dtype. This means the following types are known to work

Loại hình

Represents missing values

floating .

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3389

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3248

integer .

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3391

boolean

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3392

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3119

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3303

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3119

categorical . see the section below

object .

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3396

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3248

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3398 columns are not supported, and WILL FAIL

Categorical data#

You can write data that contains

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3399 dtypes to a
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3205. Queries work the same as if it was an object array. However, the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3399 dtyped data is stored in a more efficient manner

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
149

String columns#

min_itemsize

The underlying implementation of

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3205 uses a fixed column width [itemsize] for string columns. A string column itemsize is calculated as the maximum of the length of data [for that column] that is passed to the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3205, in the first append. Subsequent appends, may introduce a string for a column larger than the column can hold, an Exception will be raised [otherwise you could have a silent truncation of these columns, leading to loss of information]. In the future we may relax this and allow a user-specified truncation to occur

Pass

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3404 on the first table creation to a-priori specify the minimum length of a particular string column.
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3404 can be an integer, or a dict mapping a column name to an integer. You can pass
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3103 as a key to allow all indexables or data_columns to have this min_itemsize

Passing a

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3404 dict will cause all passed columns to be created as data_columns automatically

Note

If you are not passing any

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3259, then the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3404 will be the maximum of the length of any string passed

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
150

nan_rep

String columns will serialize a

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3248 [a missing value] with the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3411 string representation. This defaults to the string value
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3249. You could inadvertently turn an actual
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3249 value into a missing value

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
151

External compatibility#

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3205 writes
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3104 format objects in specific formats suitable for producing loss-less round trips to pandas objects. For external compatibility,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3205 can read native
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3213 format tables

It is possible to write an

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3205 object that can easily be imported into
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3419 using the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3420 library [Package website]. Create a table format store like this

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
152

In R this file can be read into a

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3421 object using the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3420 library. The following example function reads the corresponding column names and data values from the values and assembles them into a
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3421

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
153

Now you can import the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 into R

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
154

Note

The R function lists the entire HDF5 file’s contents and assembles the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3421 object from all matching nodes, so use this only as a starting point if you have stored multiple
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 objects to a single HDF5 file

Performance#

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3427 format come with a writing performance penalty as compared to
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3214 stores. The benefit is the ability to append/delete and query [potentially very large amounts of data]. Write times are generally longer as compared with regular stores. Thời gian truy vấn có thể khá nhanh, đặc biệt là trên trục được lập chỉ mục

  • You can pass

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3429 to
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3231, specifying the write chunksize [default is 50000]. This will significantly lower your memory usage on writing

  • You can pass

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3431 to the first
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3231, to set the TOTAL number of rows that
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3213 will expect. This will optimize read/write performance

  • Duplicate rows can be written to tables, but are filtered out in selection [with the last items being selected; thus a table is unique on major, minor pairs]

  • A

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3434 will be raised if you are attempting to store types that will be pickled by PyTables [rather than stored as endemic types]. See Here for more information and some solutions

Feather#

Feather provides binary columnar serialization for data frames. It is designed to make reading and writing data frames efficient, and to make sharing data across data analysis languages easy

Feather is designed to faithfully serialize and de-serialize DataFrames, supporting all of the pandas dtypes, including extension dtypes such as categorical and datetime with tz

Several caveats

  • The format will NOT write an

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3720, or
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2076 for the
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    43 and will raise an error if a non-default one is provided. You can
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3438 to store the index or
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3439 to ignore it

  • Tên cột trùng lặp và tên cột không phải chuỗi không được hỗ trợ

  • Các đối tượng Python thực tế trong các cột dtype đối tượng không được hỗ trợ. These will raise a helpful error message on an attempt at serialization

See the Full Documentation

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
155

Write to a feather file

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
156

Read from a feather file

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
157

Parquet#

Apache Parquet provides a partitioned binary columnar serialization for data frames. It is designed to make reading and writing data frames efficient, and to make sharing data across data analysis languages easy. Parquet can use a variety of compression techniques to shrink the file size as much as possible while still maintaining good read performance

Parquet is designed to faithfully serialize and de-serialize

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 s, supporting all of the pandas dtypes, including extension dtypes such as datetime with tz

Several caveats

  • Tên cột trùng lặp và tên cột không phải chuỗi không được hỗ trợ

  • The

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2097 engine always writes the index to the output, but
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3442 only writes non-default indexes. Cột bổ sung này có thể gây ra sự cố cho những người tiêu dùng không phải là pandas không mong đợi điều đó. You can force including or omitting indexes with the
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2942 argument, regardless of the underlying engine

  • Index level names, if specified, must be strings

  • Trong công cụ

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2097, các kiểu dữ liệu phân loại cho các loại không phải chuỗi có thể được đánh số thứ tự thành sàn gỗ, nhưng sẽ hủy đánh số thứ tự như kiểu dữ liệu nguyên thủy của chúng

  • The

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2097 engine preserves the
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3734 flag of categorical dtypes with string types.
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3442 does not preserve the
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3734 flag

  • Non supported types include

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3449 and actual Python object types. These will raise a helpful error message on an attempt at serialization.
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3450 type is supported with pyarrow >= 0. 16. 0

  • The

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    2097 engine preserves extension data types such as the nullable integer and string data type [requiring pyarrow >= 0. 16. 0, and requiring the extension type to implement the needed protocols, see the extension types documentation ].

You can specify an

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3136 to direct the serialization. This can be one of
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2097, or
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3442, or
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3455. If the engine is NOT specified, then the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3456 option is checked; if this is also
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3455, then
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2097 is tried, and falling back to
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3442

See the documentation for pyarrow and fastparquet

Note

These engines are very similar and should read/write nearly identical parquet format files.

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3460 supports timedelta data,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3461 supports timezone aware datetimes. These libraries differ by having different underlying dependencies [
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3442 by using
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3463, while
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2097 uses a c-library]

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
158

Write to a parquet file

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
159

Read from a parquet file

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
160

Read only certain columns of a parquet file

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
161

Handling indexes#

Nối tiếp một

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 thành sàn gỗ có thể bao gồm chỉ mục ẩn dưới dạng một hoặc nhiều cột trong tệp đầu ra. Thus, this code

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
162

creates a parquet file with three columns if you use

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2097 for serialization.
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3467,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3468, and
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3469. If you’re using
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3442, the index may or may not be written to the file

This unexpected extra column causes some databases like Amazon Redshift to reject the file, because that column doesn’t exist in the target table

If you want to omit a dataframe’s indexes when writing, pass

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3316 to
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3472

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
163

This creates a parquet file with just the two expected columns,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3467 and
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3468. If your
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 has a custom index, you won’t get it back when you load this file into a
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43

Passing

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3477 will always write the index, even if that’s not the underlying engine’s default behavior

Partitioning Parquet files#

Parquet supports partitioning of data based on the values of one or more columns

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
164

The

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3478 specifies the parent directory to which data will be saved. The
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3479 are the column names by which the dataset will be partitioned. Columns are partitioned in the order they are given. The partition splits are determined by the unique values in the partition columns. The above example creates a partitioned dataset that may look like

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
165

ORC#

New in version 1. 0. 0

Similar to the parquet format, the ORC Format is a binary columnar serialization for data frames. It is designed to make reading data frames efficient. pandas provides both the reader and the writer for the ORC format,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3480 and
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3481. This requires the pyarrow library.

Warning

  • It is highly recommended to install pyarrow using conda due to some issues occurred by pyarrow

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3481 requires pyarrow>=7. 0. 0

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3480 and
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3481 are not supported on Windows yet, you can find valid environments on install optional dependencies .

  • For supported dtypes please refer to supported ORC features in Arrow

  • Currently timezones in datetime columns are not preserved when a dataframe is converted into ORC files

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
166

Write to an orc file

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
167

Read from an orc file

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
168

Chỉ đọc một số cột nhất định của tệp orc

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
169

SQL queries#

The

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3485 module provides a collection of query wrappers to both facilitate data retrieval and to reduce dependency on DB-specific API. Database abstraction is provided by SQLAlchemy if installed. In addition you will need a driver library for your database. Examples of such drivers are psycopg2 for PostgreSQL or pymysql for MySQL. For SQLite this is included in Python’s standard library by default. You can find an overview of supported drivers for each SQL dialect in the SQLAlchemy docs

If SQLAlchemy is not installed, a fallback is only provided for sqlite [and for mysql for backwards compatibility, but this is deprecated and will be removed in a future version]. This mode requires a Python database adapter which respect the Python DB-API

See also some cookbook examples for some advanced strategies.

The key functions are

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3486[table_name, con[, schema, . ]]

Read SQL database table into a DataFrame

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3487[sql, con[, index_col, . ]]

Read SQL query into a DataFrame

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3488[sql, con[, index_col, . ]]

Đọc truy vấn SQL hoặc bảng cơ sở dữ liệu vào DataFrame

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3489[name, con[, schema, . ]]

Write records stored in a DataFrame to a SQL database

Note

The function

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3490 is a convenience wrapper around
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3491 and
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3492 [and for backward compatibility] and will delegate to specific function depending on the provided input [database table name or sql query]. Table names do not need to be quoted if they have special characters

In the following example, we use the SQlite SQL database engine. You can use a temporary SQLite database where data are stored in “memory”

Để kết nối với SQLAlchemy, bạn sử dụng hàm

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3493 để tạo đối tượng công cụ từ URI cơ sở dữ liệu. You only need to create the engine once per database you are connecting to. For more information on
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3493 and the URI formatting, see the examples below and the SQLAlchemy documentation

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
170

If you want to manage your own connections you can pass one of those instead. The example below opens a connection to the database using a Python context manager that automatically closes the connection after the block has completed. See the SQLAlchemy docs for an explanation of how the database connection is handled

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
171

Warning

When you open a connection to a database you are also responsible for closing it. Side effects of leaving a connection open may include locking the database or other breaking behaviour

Writing DataFrames#

Assuming the following data is in a

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
56, we can insert it into the database using
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3497

id

Date

Col_1

Col_2

Col_3

26

2012-10-18

X

25. 7

True

42

2012-10-19

Y

-12. 4

Sai

63

2012-10-20

Z

5. 73

True

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
172

Với một số cơ sở dữ liệu, việc ghi DataFrames lớn có thể dẫn đến lỗi do vượt quá giới hạn kích thước gói. Điều này có thể tránh được bằng cách đặt tham số

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
90 khi gọi
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3499. Ví dụ: phần sau ghi
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
56 vào cơ sở dữ liệu theo lô 1000 hàng cùng một lúc

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
173

Các kiểu dữ liệu SQL#

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3497 sẽ cố gắng ánh xạ dữ liệu của bạn sang loại dữ liệu SQL thích hợp dựa trên loại dữ liệu. Khi bạn có các cột dtype
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
72, gấu trúc sẽ cố gắng suy ra kiểu dữ liệu

Bạn luôn có thể ghi đè loại mặc định bằng cách chỉ định loại SQL mong muốn của bất kỳ cột nào bằng cách sử dụng đối số

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
88. Đối số này cần tên cột ánh xạ từ điển tới các loại SQLAlchemy [hoặc chuỗi cho chế độ dự phòng sqlite3]. Ví dụ: chỉ định sử dụng loại sqlalchemy
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3504 thay vì loại
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3505 mặc định cho các cột chuỗi

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
174

Note

Do sự hỗ trợ hạn chế cho timedelta trong các hương vị cơ sở dữ liệu khác nhau, các cột có loại

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3506 sẽ được ghi dưới dạng giá trị số nguyên dưới dạng nano giây vào cơ sở dữ liệu và cảnh báo sẽ được đưa ra

Note

Các cột của

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3399 dtype sẽ được chuyển thành biểu diễn dày đặc như bạn sẽ nhận được với
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3508 [e. g. đối với các danh mục chuỗi, điều này mang lại một chuỗi các chuỗi]. Do đó, việc đọc lại bảng cơ sở dữ liệu không tạo ra một phân loại

Kiểu dữ liệu ngày giờ#

Sử dụng SQLAlchemy,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3497 có khả năng ghi dữ liệu ngày giờ không biết múi giờ hoặc nhận biết múi giờ. Tuy nhiên, dữ liệu kết quả được lưu trữ trong cơ sở dữ liệu cuối cùng phụ thuộc vào loại dữ liệu được hỗ trợ cho dữ liệu ngày giờ của hệ thống cơ sở dữ liệu đang được sử dụng

Bảng sau đây liệt kê các kiểu dữ liệu được hỗ trợ cho dữ liệu ngày giờ đối với một số cơ sở dữ liệu phổ biến. Các phương ngữ cơ sở dữ liệu khác có thể có các loại dữ liệu khác nhau cho dữ liệu ngày giờ

cơ sở dữ liệu

Các kiểu ngày giờ SQL

Hỗ trợ múi giờ

SQLite

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3510

Không

mysql

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3511 hoặc
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3512

Không

PostgreSQL

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3511 hoặc
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3514

Đúng

Khi ghi dữ liệu nhận biết múi giờ vào cơ sở dữ liệu không hỗ trợ múi giờ, dữ liệu sẽ được ghi dưới dạng dấu thời gian ngây thơ múi giờ theo giờ địa phương đối với múi giờ

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3491 cũng có khả năng đọc dữ liệu ngày giờ nhận biết múi giờ hoặc ngây thơ. Khi đọc các loại
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3514, gấu trúc sẽ chuyển đổi dữ liệu sang UTC

Phương pháp chèn #

Tham số

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3517 kiểm soát mệnh đề chèn SQL được sử dụng. Possible values are

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    24. Sử dụng mệnh đề SQL
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3519 tiêu chuẩn [mỗi hàng một cái]

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3520. Truyền nhiều giá trị trong một mệnh đề
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3519. Nó sử dụng một cú pháp SQL đặc biệt không được hỗ trợ bởi tất cả các chương trình phụ trợ. Điều này thường mang lại hiệu suất tốt hơn cho các cơ sở dữ liệu phân tích như Presto và Redshift, nhưng lại có hiệu suất kém hơn đối với phần phụ trợ SQL truyền thống nếu bảng chứa nhiều cột. Để biết thêm thông tin, hãy kiểm tra tài liệu SQLAlchemy

  • có thể gọi được với chữ ký

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print[data]
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv[StringIO[data], dtype=object]
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    3522. Điều này có thể được sử dụng để triển khai phương thức chèn hiệu quả hơn dựa trên các tính năng phương ngữ phụ trợ cụ thể

Ví dụ về một mệnh đề có thể gọi được bằng PostgreSQL COPY

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
175

Bảng đọc #

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3491 sẽ đọc một bảng cơ sở dữ liệu được đặt tên bảng và tùy chọn một tập hợp con các cột để đọc

Note

Để sử dụng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3491, bạn phải cài đặt phần phụ thuộc tùy chọn SQLAlchemy

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
176

Note

Lưu ý rằng gấu trúc suy ra các kiểu cột từ đầu ra truy vấn chứ không phải bằng cách tra cứu các loại dữ liệu trong lược đồ cơ sở dữ liệu vật lý. Ví dụ: giả sử

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3525 là một cột số nguyên trong bảng. Sau đó, theo trực giác,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3526 sẽ trả về chuỗi giá trị số nguyên, trong khi
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3527 sẽ trả về chuỗi giá trị đối tượng [str]. Theo đó, nếu đầu ra truy vấn trống, thì tất cả các cột kết quả sẽ được trả về dưới dạng giá trị đối tượng [vì chúng là tổng quát nhất]. Nếu bạn thấy trước rằng truy vấn của mình đôi khi sẽ tạo ra một kết quả trống, thì bạn có thể muốn đánh máy rõ ràng sau đó để đảm bảo tính toàn vẹn của dtype

Bạn cũng có thể chỉ định tên của cột là chỉ mục

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 và chỉ định một tập hợp con các cột sẽ được đọc

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
177

Và bạn rõ ràng có thể buộc các cột được phân tích thành ngày

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
178

Nếu cần, bạn có thể chỉ định rõ ràng một chuỗi định dạng hoặc một lệnh của các đối số để chuyển đến

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3529

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
179

Bạn có thể kiểm tra xem một bảng có tồn tại hay không bằng cách sử dụng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3530

Hỗ trợ lược đồ #

Reading from and writing to different schema’s is supported through the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3716 keyword in the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3491 and
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3497 functions. Tuy nhiên, lưu ý rằng điều này phụ thuộc vào hương vị cơ sở dữ liệu [sqlite không có lược đồ]. Ví dụ

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
180

Querying#

Bạn có thể truy vấn bằng SQL thô trong hàm

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3492. Trong trường hợp này, bạn phải sử dụng biến thể SQL phù hợp với cơ sở dữ liệu của mình. Khi sử dụng SQLAlchemy, bạn cũng có thể chuyển các cấu trúc ngôn ngữ Biểu thức SQLAlchemy, không liên quan đến cơ sở dữ liệu

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
181

Tất nhiên, bạn có thể chỉ định một truy vấn “phức tạp” hơn

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
182

Hàm

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3492 hỗ trợ đối số
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
90. Việc chỉ định điều này sẽ trả về một trình vòng lặp thông qua các đoạn kết quả truy vấn

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
183

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
184

Bạn cũng có thể chạy một truy vấn đơn giản mà không cần tạo một

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 với
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3538. Điều này hữu ích cho các truy vấn không trả về giá trị, chẳng hạn như INSERT. Điều này có chức năng tương đương với việc gọi
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3539 trên công cụ SQLAlchemy hoặc đối tượng kết nối db. Một lần nữa, bạn phải sử dụng biến thể cú pháp SQL phù hợp với cơ sở dữ liệu của mình

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
185

Ví dụ về kết nối động cơ#

To connect with SQLAlchemy you use the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3493 function to create an engine object from database URI. You only need to create the engine once per database you are connecting to

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
186

For more information see the examples the SQLAlchemy documentation

Advanced SQLAlchemy queries#

You can use SQLAlchemy constructs to describe your query

Sử dụng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3541 để chỉ định các tham số truy vấn theo cách trung lập với phụ trợ

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
187

If you have an SQLAlchemy description of your database you can express where conditions using SQLAlchemy expressions

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
188

You can combine SQLAlchemy expressions with parameters passed to

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3490 using
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3543

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
189

Sqlite fallback#

The use of sqlite is supported without using SQLAlchemy. Chế độ này yêu cầu bộ điều hợp cơ sở dữ liệu Python tôn trọng Python DB-API

You can create connections like so

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
190

And then issue the following queries

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
191

Google BigQuery#

Warning

Starting in 0. 20. 0, pandas đã tách hỗ trợ Google BigQuery thành gói riêng biệt

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3544. You can
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3545 to get it

The

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3544 package provides functionality to read/write from Google BigQuery

gấu trúc tích hợp với gói bên ngoài này. nếu

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3544 được cài đặt, bạn có thể sử dụng các phương thức pandas
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3548 và
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3549, sẽ gọi các hàm tương ứng từ
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3544

Tài liệu đầy đủ có thể được tìm thấy ở đây

định dạng thống kê #

Ghi vào định dạng stata#

Phương pháp

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3551 sẽ ghi một DataFrame vào một. tập tin dta. Phiên bản định dạng của tệp này luôn là 115 [Stata 12]

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
192

Các tệp dữ liệu Stata có hỗ trợ loại dữ liệu hạn chế; . Ngoài ra, Stata dự trữ các giá trị nhất định để biểu thị dữ liệu bị thiếu. Xuất một giá trị không bị thiếu nằm ngoài phạm vi cho phép trong Stata cho một loại dữ liệu cụ thể sẽ nhập lại biến có kích thước lớn hơn tiếp theo. Ví dụ: các giá trị

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3552 bị hạn chế nằm trong khoảng từ -127 đến 100 trong Stata và do đó, các biến có giá trị trên 100 sẽ kích hoạt chuyển đổi thành
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3553. Các giá trị
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3249 trong kiểu dữ liệu dấu phẩy động được lưu trữ dưới dạng kiểu dữ liệu bị thiếu cơ bản [
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3561 trong Stata]

Note

Không thể xuất giá trị dữ liệu bị thiếu cho kiểu dữ liệu số nguyên

Người viết Stata xử lý một cách duyên dáng các loại dữ liệu khác bao gồm

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3562,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3563,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3564,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3565,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3566 bằng cách chuyển sang loại được hỗ trợ nhỏ nhất có thể biểu thị dữ liệu. Ví dụ: dữ liệu có loại
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3564 sẽ được chuyển thành
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3552 nếu tất cả các giá trị nhỏ hơn 100 [giới hạn trên đối với dữ liệu
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3552 không bị thiếu trong Stata] hoặc, nếu các giá trị nằm ngoài phạm vi này, biến sẽ được chuyển thành

Warning

Chuyển đổi từ

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3562 sang
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3556 có thể dẫn đến mất độ chính xác nếu giá trị
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3562 lớn hơn 2**53

Warning

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3574 và
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3551 chỉ hỗ trợ các chuỗi có độ rộng cố định chứa tối đa 244 ký tự, giới hạn do định dạng tệp dta phiên bản 115 áp đặt. Attempting to write Stata dta files with strings longer than 244 characters raises a
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2927

Đọc từ định dạng Stata#

Hàm cấp cao nhất

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3577 sẽ đọc tệp dta và trả về
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 hoặc
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3579 có thể được sử dụng để đọc tệp tăng dần

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
193

Chỉ định một

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
90 mang lại một phiên bản
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3579 có thể được sử dụng để đọc các dòng
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
90 từ tệp cùng một lúc. Đối tượng
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3579 có thể được sử dụng làm trình vòng lặp

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
194

Để kiểm soát chi tiết hơn, hãy sử dụng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2095 và chỉ định
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
90 với mỗi lệnh gọi tới
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
18

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
195

Hiện tại,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2942 được truy xuất dưới dạng cột

Tham số

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3588 cho biết có nên đọc và sử dụng nhãn giá trị để tạo biến
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0124 từ chúng hay không. Nhãn giá trị cũng có thể được truy xuất bằng hàm
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3590, hàm này yêu cầu gọi ____________ trước khi sử dụng

Tham số

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3592 cho biết liệu các biểu diễn giá trị bị thiếu trong Stata có nên được giữ nguyên hay không. Nếu
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
61 [mặc định], các giá trị bị thiếu được biểu thị dưới dạng
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3248. Nếu
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
32, các giá trị bị thiếu được biểu diễn bằng các đối tượng
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3596 và các cột chứa các giá trị bị thiếu sẽ có kiểu dữ liệu
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
72

Note

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3598 and
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3579 support . dta formats 113-115 [Stata 10-12], 117 [Stata 13], and 118 [Stata 14]

Note

Cài đặt

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3600 sẽ upcast lên kiểu dữ liệu pandas tiêu chuẩn.
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3562 cho tất cả các loại số nguyên và
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3556 cho dữ liệu dấu phẩy động. Theo mặc định, kiểu dữ liệu Stata được giữ nguyên khi nhập

Categorical data#

Dữ liệu

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0124 có thể được xuất sang tệp dữ liệu Stata dưới dạng dữ liệu được gắn nhãn giá trị. Dữ liệu đã xuất bao gồm các mã danh mục cơ bản dưới dạng giá trị dữ liệu số nguyên và danh mục dưới dạng nhãn giá trị. Stata does not have an explicit equivalent to a
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0124 and information about whether the variable is ordered is lost when exporting

Warning

Stata chỉ hỗ trợ các nhãn giá trị chuỗi và do đó,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
15 được gọi trên các danh mục khi xuất dữ liệu. Exporting
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0124 variables with non-string categories produces a warning, and can result a loss of information if the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
15 representations of the categories are not unique

Tương tự, dữ liệu được gắn nhãn có thể được nhập từ các tệp dữ liệu Stata dưới dạng các biến

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0124 bằng cách sử dụng đối số từ khóa
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3588 [
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
32 theo mặc định]. Đối số từ khóa
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3611 [
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
32 theo mặc định] xác định xem các biến
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0124 đã nhập có được sắp xếp hay không

Note

Khi nhập dữ liệu phân loại, giá trị của các biến trong tệp dữ liệu Stata không được bảo toàn do các biến

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0124 luôn sử dụng các kiểu dữ liệu số nguyên trong khoảng từ
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3615 đến
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3616 trong đó
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3310 là số lượng phân loại. Nếu các giá trị gốc trong tệp dữ liệu Stata là bắt buộc, thì có thể nhập các giá trị này bằng cách đặt ____63618, thao tác này sẽ nhập dữ liệu gốc [nhưng không nhập các nhãn biến]. Các giá trị ban đầu có thể khớp với dữ liệu phân loại đã nhập vì có một ánh xạ đơn giản giữa các giá trị dữ liệu Stata ban đầu và mã danh mục của các biến Phân loại đã nhập. các giá trị còn thiếu được gán mã
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3615 và giá trị ban đầu nhỏ nhất được gán
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
84, giá trị nhỏ thứ hai được gán
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3621, v.v. cho đến khi giá trị gốc lớn nhất được gán mã
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3616

Note

Stata hỗ trợ sê-ri được dán nhãn một phần. These series have value labels for some but not all data values. Nhập chuỗi được gắn nhãn một phần sẽ tạo ra một

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0124 với các danh mục chuỗi cho các giá trị được gắn nhãn và danh mục số cho các giá trị không có nhãn

định dạng SAS #

Hàm cấp cao nhất

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3624 có thể đọc [nhưng không ghi] SAS XPORT [. xpt] và [kể từ v0. 18. 0] SAS7BDAT [. sas7bdat] định dạng tập tin

Tệp SAS chỉ chứa hai loại giá trị. Văn bản ASCII và giá trị dấu phẩy động [thường là 8 byte nhưng đôi khi bị cắt ngắn]. Đối với tệp xuất, không có chuyển đổi loại tự động thành số nguyên, ngày hoặc phân loại. Đối với các tệp SAS7BDAT, mã định dạng có thể cho phép các biến ngày được tự động chuyển đổi thành ngày. Theo mặc định, toàn bộ tệp được đọc và trả về dưới dạng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43

Chỉ định một

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
90 hoặc sử dụng
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2095 để lấy các đối tượng người đọc [
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3628 hoặc
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3629] để đọc tệp dần dần. Các đối tượng người đọc cũng có các thuộc tính chứa thông tin bổ sung về tệp và các biến của nó

Đọc tệp SAS7BDAT

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
196

Lấy một trình vòng lặp và đọc một tệp XPORT 100.000 dòng cùng một lúc

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
197

Thông số kỹ thuật cho định dạng tệp xport có sẵn trên trang web của SAS

Không có tài liệu chính thức nào cho định dạng SAS7BDAT

định dạng SPSS#

Mới trong phiên bản 0. 25. 0

Hàm cấp cao nhất

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3630 có thể đọc [nhưng không ghi] SPSS SAV [. sav] và ZSAV [. tệp định dạng zsav]

Tệp SPSS chứa tên cột. Theo mặc định, toàn bộ tệp được đọc, các cột phân loại được chuyển đổi thành

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3631 và một
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 với tất cả các cột được trả về

Chỉ định tham số

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
47 để có được một tập hợp con các cột. Chỉ định
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3618 để tránh chuyển đổi các cột phân loại thành
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3631

Đọc một tệp SPSS

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
198

Trích xuất một tập hợp con các cột có trong

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
47 từ tệp SPSS và tránh chuyển đổi các cột phân loại thành
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3631

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
199

Thông tin thêm về các định dạng tệp SAV và ZSAV có tại đây

Các định dạng tệp khác#

bản thân gấu trúc chỉ hỗ trợ IO với một bộ định dạng tệp giới hạn ánh xạ rõ ràng tới mô hình dữ liệu dạng bảng của nó. Để đọc và ghi các định dạng tệp khác vào và từ gấu trúc, chúng tôi khuyên dùng các gói này từ cộng đồng rộng lớn hơn

netCDF#

xarray cung cấp cấu trúc dữ liệu lấy cảm hứng từ gấu trúc

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 để làm việc với bộ dữ liệu đa chiều, tập trung vào định dạng tệp netCDF và chuyển đổi dễ dàng sang và từ gấu trúc

Cân nhắc về hiệu suất#

Đây là một so sánh không chính thức của các phương pháp IO khác nhau, sử dụng pandas 0. 24. 2. Thời gian phụ thuộc vào máy và nên bỏ qua những khác biệt nhỏ

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
00

Các chức năng kiểm tra sau đây sẽ được sử dụng bên dưới để so sánh hiệu suất của một số phương pháp IO

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
01

Khi viết, ba chức năng hàng đầu về tốc độ là

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3639,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3640 và
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3641

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
02

Khi đọc, ba chức năng hàng đầu về tốc độ là

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3642,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3643 và
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print[data]
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv[StringIO[data], dtype=object]

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv[StringIO[data], dtype={"b": object, "c": np.float64, "d": "Int64"}]

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3644

Chủ Đề