Python kiểm tra xem tệp có phải là văn bản thuần túy không
The pandas I/O API is a set of top level In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object05 functions accessed like In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object06 that generally return a pandas object. The corresponding In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object07 functions are object methods that are accessed like In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object08. Below is a table containing available In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object09 and In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object10 Show
Format Type Data Description Reader Writer text CSV read_csv to_csv text Fixed-Width Text File read_fwf text JSON read_json to_json text HTML read_html to_html text Mủ cao su Styler. to_latex text XML read_xml to_xml text Local clipboard read_clipboard to_clipboard binary MS Excel read_excel to_excel binary OpenDocument read_excel binary HDF5 Format read_hdf to_hdf binary Feather Format read_feather to_feather binary Parquet Format read_parquet to_parquet binary ORC Format read_orc to_orc binary Stata read_stata to_stata binary SAS read_sas binary SPSS read_spss binary Python Pickle Format read_pickle to_pickle SQL SQL read_sql to_sql SQL Google BigQuery read_gbq to_gbq Here is an informal performance comparison for some of these IO methods. Note For examples that use the In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object11 class, make sure you import it with In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object12 for Python 3 CSV & text files#The workhorse function for reading text files (a. k. a. flat files) is In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object13. See the cookbook for some advanced strategies. Parsing options#In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object13 accepts the following common arguments Basic#filepath_or_buffer variousEither a path to a file (a In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object15, In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object16, or In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object17), URL (including http, ftp, and S3 locations), or any object with a In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object18 method (such as an open file or In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object11)sep str, defaults to In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object20 for In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object13, In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object22 for In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object23 Delimiter to use. If sep is In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object24, the C engine cannot automatically detect the separator, but the Python parsing engine can, meaning the latter will be used and automatically detect the separator by Python’s builtin sniffer tool, In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object25. In addition, separators longer than 1 character and different from In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object26 will be interpreted as regular expressions and will also force the use of the Python parsing engine. Note that regex delimiters are prone to ignoring quoted data. Ví dụ về biểu thức chính quy. In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object27delimiter str, default In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object24 Alternative argument name for sep delim_whitespace boolean, default FalseSpecifies whether or not whitespace (e. g. In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object29 or In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object30) will be used as the delimiter. Equivalent to setting In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object31. If this option is set to In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object32, nothing should be passed in for the In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object33 parameter Column and index locations and names#header int or list of ints, defaultIn [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object34 Row number(s) to use as the column names, and the start of the data. Default behavior is to infer the column names. if no names are passed the behavior is identical to In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object35 and column names are inferred from the first line of the file, if column names are passed explicitly then the behavior is identical to In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object36. Explicitly pass In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object35 to be able to replace existing names The header can be a list of ints that specify row locations for a MultiIndex on the columns e. g. In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object38. Intervening rows that are not specified will be skipped (e. g. 2 trong ví dụ này bị bỏ qua). Note that this parameter ignores commented lines and empty lines if In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object39, so header=0 denotes the first line of data rather than the first line of the filenames array-like, default In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object24 List of column names to use. If file contains no header row, then you should explicitly pass In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object36. Duplicates in this list are not allowedindex_col int, str, sequence of int / str, or False, optional, default In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object24 Column(s) to use as the row labels of the In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object43, either given as string name or column index. If a sequence of int / str is given, a MultiIndex is used Note In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object44 can be used to force pandas to not use the first column as the index, e. g. when you have a malformed file with delimiters at the end of each line The default value of In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object24 instructs pandas to guess. If the number of fields in the column header row is equal to the number of fields in the body of the data file, then a default index is used. If it is larger, then the first columns are used as index so that the remaining number of fields in the body are equal to the number of fields in the header The first row after the header is used to determine the number of columns, which will go into the index. If the subsequent rows contain less columns than the first row, they are filled with In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object46 This can be avoided through In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object47. This ensures that the columns are taken as is and the trailing data are ignoredusecols list-like or callable, default In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object24 Return a subset of the columns. If list-like, all elements must either be positional (i. e. integer indices into the document columns) or strings that correspond to column names provided either by the user in In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object49 or inferred from the document header row(s). If In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object49 are given, the document header row(s) are not taken into account. For example, a valid list-like In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object47 parameter would be In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object52 or In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object53 Element order is ignored, so In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object54 is the same as In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object55. To instantiate a DataFrame from In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object56 with element order preserved use In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object57 for columns in In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object58 order or In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object59 for In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object60 order If callable, the callable function will be evaluated against the column names, returning names where the callable function evaluates to True In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object6 Using this parameter results in much faster parsing time and lower memory usage when using the c engine. The Python engine loads the data first before deciding which columns to drop squeeze boolean, defaultIn [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object61 If the parsed data only contains one column then return a In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object62 Deprecated since version 1. 4. 0. Append In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object63 to the call to In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object64 to squeeze the data. prefix str, default In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object24 Prefix to add to column numbers when no header, e. g. ‘X’ for X0, X1, … Không dùng nữa kể từ phiên bản 1. 4. 0. Use a list comprehension on the DataFrame’s columns after calling In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object66. In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object3mangle_dupe_cols boolean, default In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object32 Duplicate columns will be specified as ‘X’, ‘X. 1’…’X. N’, rather than ‘X’…’X’. Passing in In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object61 will cause data to be overwritten if there are duplicate names in the columns Deprecated since version 1. 5. 0. The argument was never implemented, and a new argument where the renaming pattern can be specified will be added instead. General parsing configuration#dtype Type name or dict of column -> type, defaultIn [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object24 Data type for data or columns. e. g. In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object70 Use In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object15 or In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object72 together with suitable In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object73 settings to preserve and not interpret dtype. If converters are specified, they will be applied INSTEAD of dtype conversion New in version 1. 5. 0. Support for defaultdict was added. Specify a defaultdict as input where the default determines the dtype of the columns which are not explicitly listed. engine {In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object74, In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object75, In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object76} Parser engine to use. The C and pyarrow engines are faster, while the python engine is currently more feature-complete. Multithreading is currently only supported by the pyarrow engine New in version 1. 4. 0. The “pyarrow” engine was added as an experimental engine, and some features are unsupported, or may not work correctly, with this engine. converters dict, defaultIn [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object24 Dict of functions for converting values in certain columns. Keys can either be integers or column labels true_values list, defaultIn [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object24 Values to consider as In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object32false_values list, default In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object24 Values to consider as In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object61skipinitialspace boolean, default In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object61 Skip spaces after delimiter skiprows list-like or integer, defaultIn [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object24 Line numbers to skip (0-indexed) or number of lines to skip (int) at the start of the file If callable, the callable function will be evaluated against the row indices, returning True if the row should be skipped and False otherwise In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object1skipfooter int, default In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object84 Số dòng ở cuối tệp cần bỏ qua (không được hỗ trợ với engine=’c’) nrows int, mặc địnhIn [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object24 Số hàng của tập tin để đọc. Hữu ích để đọc các phần của tệp lớn low_memory boolean, mặc địnhIn [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object32 Xử lý nội bộ tệp theo khối, dẫn đến việc sử dụng bộ nhớ thấp hơn trong khi phân tích cú pháp, nhưng có thể suy luận kiểu hỗn hợp. Để đảm bảo không có loại hỗn hợp, hãy đặt In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object61 hoặc chỉ định loại bằng tham số In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object88. Lưu ý rằng toàn bộ tệp được đọc thành một In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object43 duy nhất, sử dụng tham số In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object90 hoặc In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object91 để trả về dữ liệu theo khối. (Chỉ hợp lệ với trình phân tích cú pháp C)memory_map boolean, mặc định Sai Nếu đường dẫn tệp được cung cấp cho ______ 092, ánh xạ đối tượng tệp trực tiếp vào bộ nhớ và truy cập dữ liệu trực tiếp từ đó. Sử dụng tùy chọn này có thể cải thiện hiệu suất vì không còn bất kỳ chi phí I/O nào nữa NA và xử lý dữ liệu bị thiếu#na_values vô hướng, str, dạng danh sách hoặc chính tả, mặc địnhIn [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object24 Các chuỗi bổ sung để nhận dạng là NA/NaN. Nếu dict được thông qua, các giá trị NA cụ thể trên mỗi cột. See na values const below for a list of the values interpreted as NaN by default. keep_default_na boolean, defaultIn [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object32 Whether or not to include the default NaN values when parsing the data. Depending on whether In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object73 is passed in, the behavior is as follows
Note that if In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0810 is passed in as In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object61, the In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object96 and In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object73 parameters will be ignoredna_filter boolean, default In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object32 Detect missing value markers (empty strings and the value of na_values). In data without any NAs, passing In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0815 can improve the performance of reading a large fileverbose boolean, default In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object61 Indicate number of NA values placed in non-numeric columns skip_blank_lines boolean, defaultIn [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object32 If In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object32, skip over blank lines rather than interpreting as NaN values Datetime handling#parse_dates boolean or list of ints or names or list of lists or dict, defaultIn [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object61.
Note A fast-path exists for iso8601-formatted dates infer_datetime_format boolean, defaultIn [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object61 If In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object32 and parse_dates is enabled for a column, attempt to infer the datetime format to speed up the processingkeep_date_col boolean, default In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object61 If In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object32 and parse_dates specifies combining multiple columns then keep the original columnsdate_parser function, default In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object24 Function to use for converting a sequence of string columns to an array of datetime instances. The default uses In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0829 to do the conversion. pandas will try to call date_parser in three different ways, advancing to the next if an exception occurs. 1) Pass one or more arrays (as defined by parse_dates) as arguments; 2) concatenate (row-wise) the string values from the columns defined by parse_dates into a single array and pass that; and 3) call date_parser once for each row using one or more strings (corresponding to the columns defined by parse_dates) as argumentsngày đầu tiên boolean, mặc định In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object61 Ngày định dạng DD/MM, định dạng quốc tế và châu Âu cache_dates boolean, mặc định là TrueNếu Đúng, hãy sử dụng bộ nhớ cache của các ngày đã chuyển đổi, duy nhất để áp dụng chuyển đổi ngày giờ. Có thể tạo ra tốc độ tăng đáng kể khi phân tích chuỗi ngày trùng lặp, đặc biệt là các chuỗi có chênh lệch múi giờ Mới trong phiên bản 0. 25. 0 Lần lặp #trình lặp boolean, mặc địnhIn [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object61 Return In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0832 object for iteration or getting chunks with In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0833chunksize int, default In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object24 Return In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0832 object for iteration. See iterating and chunking below. Quoting, compression, and file format#compression {In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object34, In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0837, In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0838, In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0839, In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0840, In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0841, In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object24, In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0843}, default In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object34 For on-the-fly decompression of on-disk data. If ‘infer’, then use gzip, bz2, zip, xz, or zstandard if In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object92 is path-like ending in ‘. gz’, ‘. bz2’, ‘. zip’, ‘. xz’, ‘. zst’, respectively, and no decompression otherwise. If using ‘zip’, the ZIP file must contain only one data file to be read in. Set to In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object24 for no decompression. Can also be a dict with key In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0847 set to one of { In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0839, In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0837, In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0838, In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0841} and other key-value pairs are forwarded to In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0852, In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0853, In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0854, or In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0855. As an example, the following could be passed for faster compression and to create a reproducible gzip archive. In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0856 Changed in version 1. 1. 0. dict option extended to support In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0857 and In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0858. Changed in version 1. 2. 0. Previous versions forwarded dict entries for ‘gzip’ to In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0859. thousands str, default In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object24 Thousands separator decimal str, defaultIn [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0861 Character to recognize as decimal point. E. g. use In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object20 for European datafloat_precision string, default None Specifies which converter the C engine should use for floating-point values. The options are In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object24 for the ordinary converter, In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0864 for the high-precision converter, and In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0865 for the round-trip converterlineterminator str (length 1), default In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object24 Character to break file into lines. Only valid with C parser quotechar str (length 1)The character used to denote the start and end of a quoted item. Quoted items can include the delimiter and it will be ignored quoting int orIn [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0867 instance, default In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object84 Control field quoting behavior per In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0867 constants. Sử dụng một trong số In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0870 (0), In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0871 (1), In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0872 (2) hoặc In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0873 (3)doublequote boolean, default In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object32 When In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0875 is specified and In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0876 is not In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0873, indicate whether or not to interpret two consecutive In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0875 elements inside a field as a single In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0875 elementescapechar str (length 1), default In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object24 One-character string used to escape delimiter when quoting is In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0873comment str, default In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object24 Indicates remainder of line should not be parsed. If found at the beginning of a line, the line will be ignored altogether. This parameter must be a single character. Like empty lines (as long as In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object39), fully commented lines are ignored by the parameter In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0884 but not by In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0885. For example, if In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0886, parsing ‘#empty\na,b,c\n1,2,3’ with In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object35 will result in ‘a,b,c’ being treated as the headerencoding str, default In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object24 Encoding to use for UTF when reading/writing (e. g. In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0889). Danh sách mã hóa tiêu chuẩn Pythondialect str or In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0890 instance, default In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object24 If provided, this parameter will override values (default or not) for the following parameters. In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object33, In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0893, In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0894, In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0895, In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0875, and In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0876. If it is necessary to override values, a ParserWarning will be issued. See In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0890 documentation for more details Error handling#error_bad_lines boolean, optional, defaultIn [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object24 Lines with too many fields (e. g. a csv line with too many commas) will by default cause an exception to be raised, and no In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object43 will be returned. If In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object61, then these “bad lines” will dropped from the In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object43 that is returned. See bad lines below. Deprecated since version 1. 3. 0. The In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0103 parameter should be used instead to specify behavior upon encountering a bad line instead. warn_bad_lines boolean, optional, default In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object24 If error_bad_lines is In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object61, and warn_bad_lines is In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object32, a warning for each “bad line” will be output Deprecated since version 1. 3. 0. The In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0103 parameter should be used instead to specify behavior upon encountering a bad line instead. on_bad_lines (‘error’, ‘warn’, ‘skip’), default ‘error’ Specifies what to do upon encountering a bad line (a line with too many fields). Allowed values are
New in version 1. 3. 0 Specifying column data types#You can indicate the data type for the whole In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object43 or individual columns In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object Fortunately, pandas offers more than one way to ensure that your column(s) contain only one In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object88. If you’re unfamiliar with these concepts, you can see here to learn more about dtypes, and here to learn more about In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object72 conversion in pandas. For instance, you can use the In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0111 argument of In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object13 In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object08 Or you can use the In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0113 function to coerce the dtypes after reading in the data, In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object01 which will convert all valid parsing to floats, leaving the invalid parsing as In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object46 Ultimately, how you deal with reading in columns containing mixed dtypes depends on your specific needs. In the case above, if you wanted to In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object46 out the data anomalies, then In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0113 is probably your best option. However, if you wanted for all the data to be coerced, no matter the type, then using the In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0111 argument of In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object13 would certainly be worth trying Note In some cases, reading in abnormal data with columns containing mixed dtypes will result in an inconsistent dataset. If you rely on pandas to infer the dtypes of your columns, the parsing engine will go and infer the dtypes for different chunks of the data, rather than the whole dataset at once. Do đó, bạn có thể kết thúc với (các) cột có các kiểu dữ liệu hỗn hợp. For example, In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object20 will result with In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0119 containing an In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0120 dtype for certain chunks of the column, and In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object15 for others due to the mixed dtypes from the data that was read in. It is important to note that the overall column will be marked with a In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object88 of In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object72, which is used for columns with mixed dtypes Specifying categorical dtype#In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0124 columns can be parsed directly by specifying In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0125 or In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0126 In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object29 Individual columns can be parsed as a In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0124 using a dict specification In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object31 Specifying In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0125 will result in an unordered In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0124 whose In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0130 are the unique values observed in the data. For more control on the categories and order, create a In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0131 ahead of time, and pass that for that column’s In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object88 In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object37 When using In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0133, “unexpected” values outside of In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0134 are treated as missing values In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object30 This matches the behavior of In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0135 Note With In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0125, the resulting categories will always be parsed as strings (object dtype). If the categories are numeric they can be converted using the In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0113 function, or as appropriate, another converter such as In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0138 When In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object88 is a In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0131 with homogeneous In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0130 ( all numeric, all datetimes, etc. ), the conversion is done automatically In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object31 Naming and using columns#Handling column names#A file may or may not have a header row. pandas assumes the first row should be used as the column names In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object32 By specifying the In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object49 argument in conjunction with In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0884 you can indicate other names to use and whether or not to throw away the header row (if any) In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object33 If the header is in a row other than the first, pass the row number to In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0884. This will skip the preceding rows In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object34 Note Default behavior is to infer the column names. if no names are passed the behavior is identical to In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object35 and column names are inferred from the first non-blank line of the file, if column names are passed explicitly then the behavior is identical to In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object36 Duplicate names parsing#
If the file or header contains duplicate names, pandas will by default distinguish between them so as to prevent overwriting data In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object35 There is no more duplicate data because In [13]: import numpy as np In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11" In [15]: print(data) a,b,c,d 1,2,3,4 5,6,7,8 9,10,11 In [16]: df = pd.read_csv(StringIO(data), dtype=object) In [17]: df Out[17]: a b c d 0 1 2 3 4 1 5 6 7 8 2 9 10 11 NaN In [18]: df["a"][0] Out[18]: '1' In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"}) In [20]: df.dtypes Out[20]: a int64 b object c float64 d Int64 dtype: object0148 by default, which modifies a series of duplicate columns ‘X’, …, ‘X’ to become ‘X’, ‘X. 1’, …, ‘X. N’ Filtering columns ( |