门户网站定制,做网站优化的,php网站api接口写法,网易企业邮箱目录
一、用法精讲
39、pandas.DataFrame.to_stata函数
39-1、语法
39-2、参数
39-3、功能
39-4、返回值
39-5、说明
39-6、用法
39-6-1、数据准备
39-6-2、代码示例
39-6-3、结果输出
40、pandas.read_stata函数
40-1、语法
40-2、参数
40-3、功能
40-4、返回…目录
一、用法精讲
39、pandas.DataFrame.to_stata函数
39-1、语法
39-2、参数
39-3、功能
39-4、返回值
39-5、说明
39-6、用法
39-6-1、数据准备
39-6-2、代码示例
39-6-3、结果输出
40、pandas.read_stata函数
40-1、语法
40-2、参数
40-3、功能
40-4、返回值
40-5、说明
40-6、用法
40-6-1、数据准备
40-6-2、代码示例
40-6-3、结果输出
二、推荐阅读
1、Python筑基之旅
2、Python函数之旅
3、Python算法之旅
4、Python魔法之旅
5、博客个人主页 一、用法精讲
39、pandas.DataFrame.to_stata函数
39-1、语法
# 39、pandas.DataFrame.to_stata函数
DataFrame.to_stata(path, *, convert_datesNone, write_indexTrue, byteorderNone, time_stampNone, data_labelNone, variable_labelsNone, version114, convert_strlNone, compressioninfer, storage_optionsNone, value_labelsNone)
Export DataFrame object to Stata dta format.Writes the DataFrame to a Stata dataset file. “dta” files contain a Stata dataset.Parameters:
pathstr, path object, or buffer
String, path object (implementing os.PathLike[str]), or file-like object implementing a binary write() function.convert_datesdict
Dictionary mapping columns containing datetime types to stata internal format to use when writing the dates. Options are ‘tc’, ‘td’, ‘tm’, ‘tw’, ‘th’, ‘tq’, ‘ty’. Column can be either an integer or a name. Datetime columns that do not have a conversion type specified will be converted to ‘tc’. Raises NotImplementedError if a datetime column has timezone information.write_indexbool
Write the index to Stata dataset.byteorderstr
Can be “”, “”, “little”, or “big”. default is sys.byteorder.time_stampdatetime
A datetime to use as file creation date. Default is the current time.data_labelstr, optional
A label for the data set. Must be 80 characters or smaller.variable_labelsdict
Dictionary containing columns as keys and variable labels as values. Each label must be 80 characters or smaller.version{114, 117, 118, 119, None}, default 114
Version to use in the output dta file. Set to None to let pandas decide between 118 or 119 formats depending on the number of columns in the frame. Version 114 can be read by Stata 10 and later. Version 117 can be read by Stata 13 or later. Version 118 is supported in Stata 14 and later. Version 119 is supported in Stata 15 and later. Version 114 limits string variables to 244 characters or fewer while versions 117 and later allow strings with lengths up to 2,000,000 characters. Versions 118 and 119 support Unicode characters, and version 119 supports more than 32,767 variables.Version 119 should usually only be used when the number of variables exceeds the capacity of dta format 118. Exporting smaller datasets in format 119 may have unintended consequences, and, as of November 2020, Stata SE cannot read version 119 files.convert_strllist, optional
List of column names to convert to string columns to Stata StrL format. Only available if version is 117. Storing strings in the StrL format can produce smaller dta files if strings have more than 8 characters and values are repeated.compressionstr or dict, default ‘infer’
For on-the-fly compression of the output data. If ‘infer’ and ‘path’ is path-like, then detect compression from the following extensions: ‘.gz’, ‘.bz2’, ‘.zip’, ‘.xz’, ‘.zst’, ‘.tar’, ‘.tar.gz’, ‘.tar.xz’ or ‘.tar.bz2’ (otherwise no compression). Set to None for no compression. Can also be a dict with key method set to one of {zip, gzip, bz2, zstd, xz, tar} and other key-value pairs are forwarded to zipfile.ZipFile, gzip.GzipFile, bz2.BZ2File, zstandard.ZstdCompressor, lzma.LZMAFile or tarfile.TarFile, respectively. As an example, the following could be passed for faster compression and to create a reproducible gzip archive: compression{method: gzip, compresslevel: 1, mtime: 1}.New in version 1.5.0: Added support for .tar files.Changed in version 1.4.0: Zstandard support.storage_optionsdict, optional
Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc. For HTTP(S) URLs the key-value pairs are forwarded to urllib.request.Request as header options. For other URLs (e.g. starting with “s3://”, and “gcs://”) the key-value pairs are forwarded to fsspec.open. Please see fsspec and urllib for more details, and for more examples on storage options refer here.value_labelsdict of dicts
Dictionary containing columns as keys and dictionaries of column value to labels as values. Labels for a single variable must be 32,000 characters or smaller.New in version 1.4.0.Raises:
NotImplementedError
If datetimes contain timezone informationColumn dtype is not representable in StataValueError
Columns listed in convert_dates are neither datetime64[ns] or datetime.datetimeColumn listed in convert_dates is not in DataFrameCategorical label contains more than 32,000 characters
39-2、参数
39-2-1、path(必须)要写入的文件的路径(包括文件名)。
39-2-2、convert_dates(可选默认值为None)字典指定哪些列应该被转换为Stata的日期或日期时间格式键是列名值是日期时间格式(如tc表示Stata中的日期时间td表示日期)。如果列名不是DataFrame中的列则会被忽略。
39-2-3、write_index(可选默认值为True)是否将DataFrame的索引作为一列写入Stata文件。如果为False则不写入索引。
39-2-4、byteorder(可选默认值为None)字节顺序用于写入文件。通常为None允许pandas自行决定(通常是 表示小端序)但在某些特殊情况下如果Stata文件需要在特定系统或版本上读取可能需要手动设置。
39-2-5、time_stamp(可选默认值为None)写入文件的时间戳这不会改变文件内容但会在Stata中作为数据集的创建或修改时间显示。
39-2-6、data_label(可选默认值为None)数据集标签一个简短的描述性文本字符串用于在Stata中标识数据集。
39-2-7、variable_labels(可选默认值为None)字典指定DataFrame中各列的变量标签键是列名值是描述性文本字符串。
39-2-8、version(可选默认值为114)Stata文件的版本对应于Stata 14及更高版本不同版本的Stata支持不同的数据类型和特性。
39-2-9、convert_strl(可选默认值为None)Stata 14引入了strl类型用于存储长度可变的字符串这个参数允许你指定哪些列应该被转换为strl类型(如果version参数允许)。默认情况下pandas会根据列中的最大字符串长度自动决定是否使用strl类型。
39-2-10、compression(可选默认值为infer)压缩方法。infer 会根据 path 的文件扩展名自动选择压缩方法(如果文件扩展名为.zip或.xz)zip和xz分别指定ZIP和XZ压缩。如果为None则不进行压缩。
39-2-11、storage_options(可选默认值为None)用于任何存储连接的额外选项例如存储账户凭证这通常用于云存储系统(如S3、GCS、HDFS等)对于本地文件系统或标准的文件I/O操作此参数通常不使用。
39-2-12、value_labels(可选默认值为None)字典用于为DataFrame中的分类变量指定值标签。键是列名值是一个从类别值到标签的映射字典这对于在Stata中创建易于理解的分类变量非常有用。
39-3、功能 用于将pandas DataFrame保存到Stata的.dta格式文件中。
39-4、返回值 本身并不返回任何值(即返回值为None)它的主要作用是将DataFrame的内容写入到指定的 .dta文件中而不是在Python环境中返回一个对象或值。
39-5、说明 Stata是一种广泛使用的统计软件.dta文件是Stata的专有数据格式用于存储数据集。通过这个函数用户可以将pandas DataFrame中的数据保存为Stata可以直接读取和处理的文件格式。
39-6、用法
39-6-1、数据准备
无
39-6-2、代码示例
# 39、pandas.DataFrame.to_stata函数
import pandas as pd
# 创建一个示例DataFrame
data {name: [John, Anna, Peter, Linda],age: [28, 34, 29, 32],date_of_birth: pd.to_datetime([1992-01-01, 1988-02-15, 1991-07-23, 1989-10-10]),city: [New York, Paris, Berlin, London]
}
df pd.DataFrame(data)
# 设置变量标签
variable_labels {name: Person Name,age: Age in Years,date_of_birth: Date of Birth,city: City of Residence
}
# 设置数据标签
data_label Demo Dataset for Pandas to Stata Conversion
# 将 DataFrame 保存到 Stata 文件
# 这里我们使用了 Stata 114 格式即 Stata 14 及以上版本它支持字符串变量长度超过 244 字符
# 我们还指定了转换日期写入索引并添加了变量和数据标签
df.to_stata(example.dta,convert_dates{date_of_birth: td}, # 将 date_of_birth 转换为 Stata 日期格式write_indexFalse, # 不写入索引到 Stata 文件variable_labelsvariable_labels, # 添加变量标签data_labeldata_label, # 添加数据标签version114) # 指定 Stata 文件的版本
print(DataFrame has been successfully saved to Stata file.)
39-6-3、结果输出
# DataFrame has been successfully saved to Stata file.
40、pandas.read_stata函数
40-1、语法
# 40、pandas.read_stata函数
pandas.read_stata(filepath_or_buffer, *, convert_datesTrue, convert_categoricalsTrue, index_colNone, convert_missingFalse, preserve_dtypesTrue, columnsNone, order_categoricalsTrue, chunksizeNone, iteratorFalse, compressioninfer, storage_optionsNone)
Read Stata file into DataFrame.Parameters:
filepath_or_bufferstr, path object or file-like object
Any valid string path is acceptable. The string could be a URL. Valid URL schemes include http, ftp, s3, and file. For file URLs, a host is expected. A local file could be: file://localhost/path/to/table.dta.If you want to pass in a path object, pandas accepts any os.PathLike.By file-like object, we refer to objects with a read() method, such as a file handle (e.g. via builtin open function) or StringIO.convert_datesbool, default True
Convert date variables to DataFrame time values.convert_categoricalsbool, default True
Read value labels and convert columns to Categorical/Factor variables.index_colstr, optional
Column to set as index.convert_missingbool, default False
Flag indicating whether to convert missing values to their Stata representations. If False, missing values are replaced with nan. If True, columns containing missing values are returned with object data types and missing values are represented by StataMissingValue objects.preserve_dtypesbool, default True
Preserve Stata datatypes. If False, numeric data are upcast to pandas default types for foreign data (float64 or int64).columnslist or None
Columns to retain. Columns will be returned in the given order. None returns all columns.order_categoricalsbool, default True
Flag indicating whether converted categorical data are ordered.chunksizeint, default None
Return StataReader object for iterations, returns chunks with given number of lines.iteratorbool, default False
Return StataReader object.compressionstr or dict, default ‘infer’
For on-the-fly decompression of on-disk data. If ‘infer’ and ‘filepath_or_buffer’ is path-like, then detect compression from the following extensions: ‘.gz’, ‘.bz2’, ‘.zip’, ‘.xz’, ‘.zst’, ‘.tar’, ‘.tar.gz’, ‘.tar.xz’ or ‘.tar.bz2’ (otherwise no compression). If using ‘zip’ or ‘tar’, the ZIP file must contain only one data file to be read in. Set to None for no decompression. Can also be a dict with key method set to one of {zip, gzip, bz2, zstd, xz, tar} and other key-value pairs are forwarded to zipfile.ZipFile, gzip.GzipFile, bz2.BZ2File, zstandard.ZstdDecompressor, lzma.LZMAFile or tarfile.TarFile, respectively. As an example, the following could be passed for Zstandard decompression using a custom compression dictionary: compression{method: zstd, dict_data: my_compression_dict}.New in version 1.5.0: Added support for .tar files.storage_optionsdict, optional
Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc. For HTTP(S) URLs the key-value pairs are forwarded to urllib.request.Request as header options. For other URLs (e.g. starting with “s3://”, and “gcs://”) the key-value pairs are forwarded to fsspec.open. Please see fsspec and urllib for more details, and for more examples on storage options refer here.Returns:
DataFrame or pandas.api.typing.StataReader.
40-2、参数
40-2-1、filepath_or_buffer(必须)字符串、路径对象或任何对象实现read()方法(如文件句柄或StringIO)这是要读取的.dta文件的路径或文件对象。
40-2-2、convert_dates(可选默认值为True)布尔值如果为True则尝试将列转换为日期类型如果数据中包含Stata日期时间这非常有用。
40-2-3、convert_categoricals(可选默认值为True)布尔值如果为True则尝试将列中的Stata值标签(value labels)转换为pandas的类别数据类型(Categorical dtype)。
40-2-4、index_col(可选默认值为None)字符串或字符串列表用作DataFrame行索引的列名或列名列表如果传递了多个列名将生成一个MultiIndex。
40-2-5、convert_missing(可选默认值为False)布尔值如果为True则Stata 缺失值(如 .)将被转换为pandas的NaN值。然而请注意pandas通常已经能够正确处理这些缺失值除非你有特定的理由需要更改此行为。
40-2-6、preserve_dtypes(可选默认值为True)布尔值如果为False则在读取数据时不会尝试保留Stata 数据类型(如Stata 的字符串类型将被转换为pandas的object类型)。在某些情况下这可以提高读取速度但可能会丢失数据类型信息。
40-2-7、columns(可选默认值为None)字符串列表返回DataFrame中要包含的列名列表如果为None则读取所有列。
40-2-8、order_categoricals(可选默认值为True)布尔值如果为True则对读取的类别数据类型(Categorical dtype)的类别进行排序这基于Stata文件中定义的类别顺序。
40-2-9、chunksize(可选默认值为None)整数如果指定了非零值则返回一个迭代器该迭代器以chunksize行数为块提供DataFrame这对于处理大型文件时节省内存非常有用。
40-2-10、iterator(可选默认值为False)布尔值如果为True则返回TextFileReader对象该对象可以迭代以分块读取文件这与chunksize参数结合使用时特别有用。
40-2-11、compression(可选默认值为infer)字符串或None用于指定文件压缩类型的字符串如gzip、bz2、zip、xz或infer(如果filepath_or_buffer是字符串则自动检测压缩)如果为None则不进行解压缩。
40-2-12、storage_options(可选默认值为None)字典对于存储在如Google Cloud Storage、Amazon S3等云存储服务中的文件此参数允许传递额外的选项来访问这些文件。
40-3、功能 将Stata的.dta格式文件读取到pandas DataFrame中。
40-4、返回值 返回值是一个pandas DataFrame对象该对象包含了从.dta文件中读取的数据。
40-5、说明 无
40-6、用法
40-6-1、数据准备
# 使用pandas.DataFrame.to_stata函数创建.dta文件
import pandas as pd
# 创建一个示例DataFrame
data {name: [John, Anna, Peter, Linda],age: [28, 34, 29, 32],date_of_birth: pd.to_datetime([1992-01-01, 1988-02-15, 1991-07-23, 1989-10-10]),city: [New York, Paris, Berlin, London]
}
df pd.DataFrame(data)
# 设置变量标签
variable_labels {name: Person Name,age: Age in Years,date_of_birth: Date of Birth,city: City of Residence
}
# 设置数据标签
data_label Demo Dataset for Pandas to Stata Conversion
# 将 DataFrame 保存到 Stata 文件
# 这里我们使用了 Stata 114 格式即 Stata 14 及以上版本它支持字符串变量长度超过 244 字符
# 我们还指定了转换日期写入索引并添加了变量和数据标签
df.to_stata(example.dta,convert_dates{date_of_birth: td}, # 将 date_of_birth 转换为 Stata 日期格式write_indexFalse, # 不写入索引到 Stata 文件variable_labelsvariable_labels, # 添加变量标签data_labeldata_label, # 添加数据标签version114) # 指定 Stata 文件的版本
print(DataFrame has been successfully saved to Stata file.)
40-6-2、代码示例
# 40、pandas.read_stata函数
import pandas as pd
# 指定.dta文件的路径
file_path example.dta
# 使用pandas的read_stata函数读取文件
df pd.read_stata(file_path)
# 显示DataFrame的前几行以确认数据已正确读取
print(df.head())
40-6-3、结果输出
# 40、pandas.read_stata函数
# name age date_of_birth city
# 0 John 28 1992-01-01 New York
# 1 Anna 34 1988-02-15 Paris
# 2 Peter 29 1991-07-23 Berlin
# 3 Linda 32 1989-10-10 London
二、推荐阅读
1、Python筑基之旅
2、Python函数之旅
3、Python算法之旅
4、Python魔法之旅
5、博客个人主页