咸宁商城网站建设,商业网站建设视频教程,wordpress.主题,绵阳市公司网站建设Alphalens 是非常著名的一个python因子分析库。但是该库由于目前已经不怎么维护#xff0c;问题非常多。最新的使用建议使用alphalens-reloaded#xff0c;地址#xff1a;stefan-jansen/alphalens-reloaded: Performance analysis of predictive (alpha) stock factors (gi…Alphalens 是非常著名的一个python因子分析库。但是该库由于目前已经不怎么维护问题非常多。最新的使用建议使用alphalens-reloaded地址stefan-jansen/alphalens-reloaded: Performance analysis of predictive (alpha) stock factors (github.com)。
由于该库的demo都是基于国外雅虎财经的接口yfinance。如果使用国内的akshare配合使用会出现一下问题。需要对Alphalens接口非常熟悉。建议阅读其原始接口的注释特别是get_clean_factor_and_forward_returns方法。
def get_clean_factor_and_forward_returns(factor,prices,groupbyNone,binning_by_groupFalse,quantiles5,binsNone,periods(1, 5, 10),filter_zscore20,groupby_labelsNone,max_loss0.35,zero_awareFalse,cumulative_returnsTrue):Formats the factor data, pricing data, and group mappings into a DataFramethat contains aligned MultiIndex indices of timestamp and asset. Thereturned data will be formatted to be suitable for Alphalens functions.It is safe to skip a call to this function and still make use of Alphalensfunctionalities as long as the factor data conforms to the format returnedfrom get_clean_factor_and_forward_returns and documented hereParameters----------factor : pd.Series - MultiIndexA MultiIndex Series indexed by timestamp (level 0) and asset(level 1), containing the values for a single alpha factor.::-----------------------------------date | asset |-----------------------------------| AAPL | 0.5-----------------------| BA | -1.1-----------------------2014-01-01 | CMG | 1.7-----------------------| DAL | -0.1-----------------------| LULU | 2.7-----------------------prices : pd.DataFrameA wide form Pandas DataFrame indexed by timestamp with assetsin the columns.Pricing data must span the factor analysis time period plus anadditional buffer window that is greater than the maximum numberof expected periods in the forward returns calculations.It is important to pass the correct pricing data in depending onwhat time of period your signal was generated so to avoid lookaheadbias, or delayed calculations.Prices must contain at least an entry for each timestamp/assetcombination in factor. This entry should reflect the buy pricefor the assets and usually it is the next available price after thefactor is computed but it can also be a later price if the factor ismeant to be traded later (e.g. if the factor is computed at marketopen but traded 1 hour after market open the price information shouldbe 1 hour after market open).Prices must also contain entries for timestamps following eachtimestamp/asset combination in factor, as many more timestampsas the maximum value in periods. The asset price after periodtimestamps will be considered the sell price for that asset whencomputing period forward returns.::----------------------------------------------------| AAPL | BA | CMG | DAL | LULU |----------------------------------------------------Date | | | | | |----------------------------------------------------2014-01-01 |605.12| 24.58| 11.72| 54.43 | 37.14 |----------------------------------------------------2014-01-02 |604.35| 22.23| 12.21| 52.78 | 33.63 |----------------------------------------------------2014-01-03 |607.94| 21.68| 14.36| 53.94 | 29.37 |----------------------------------------------------groupby : pd.Series - MultiIndex or dictEither A MultiIndex Series indexed by date and asset,containing the period wise group codes for each asset, ora dict of asset to group mappings. If a dict is passed,it is assumed that group mappings are unchanged for theentire time period of the passed factor data.binning_by_group : boolIf True, compute quantile buckets separately for each group.This is useful when the factor values range vary considerablyacross gorups so that it is wise to make the binning group relative.You should probably enable this if the factor is intendedto be analyzed for a group neutral portfolioquantiles : int or sequence[float]Number of equal-sized quantile buckets to use in factor bucketing.Alternately sequence of quantiles, allowing non-equal-sized bucketse.g. [0, .10, .5, .90, 1.] or [.05, .5, .95]Only one of quantiles or bins can be not-Nonebins : int or sequence[float]Number of equal-width (valuewise) bins to use in factor bucketing.Alternately sequence of bin edges allowing for non-uniform bin widthe.g. [-4, -2, -0.5, 0, 10]Chooses the buckets to be evenly spaced according to the valuesthemselves. Useful when the factor contains discrete values.Only one of quantiles or bins can be not-Noneperiods : sequence[int]periods to compute forward returns on.filter_zscore : int or float, optionalSets forward returns greater than X standard deviationsfrom the the mean to nan. Set it to None to avoid filtering.Caution: this outlier filtering incorporates lookahead bias.groupby_labels : dictA dictionary keyed by group code with values correspondingto the display name for each group.max_loss : float, optionalMaximum percentage (0.00 to 1.00) of factor data dropping allowed,computed comparing the number of items in the input factor index andthe number of items in the output DataFrame index.Factor data can be partially dropped due to being flawed itself(e.g. NaNs), not having provided enough price data to computeforward returns for all factor values, or because it is not possibleto perform binning.Set max_loss0 to avoid Exceptions suppression.zero_aware : bool, optionalIf True, compute quantile buckets separately for positive and negativesignal values. This is useful if your signal is centered and zero isthe separation between long and short signals, respectively.cumulative_returns : bool, optionalIf True, forward returns columns will contain cumulative returns.Setting this to False is useful if you want to analyze how predictivea factor is for a single forward day.Returns-------merged_data : pd.DataFrame - MultiIndexA MultiIndex Series indexed by date (level 0) and asset (level 1),containing the values for a single alpha factor, forward returns foreach period, the factor quantile/bin that factor value belongs to, and(optionally) the group the asset belongs to.- forward returns column names follow the format accepted bypd.Timedelta (e.g. 1D, 30m, 3h15m, 1D1h, etc)- date index freq property (merged_data.index.levels[0].freq) will beset to a trading calendar (pandas DateOffset) inferred from the inputdata (see infer_trading_calendar for more details). This is currentlyused only in cumulative returns computation::-------------------------------------------------------------------| | 1D | 5D | 10D |factor|group|factor_quantile-------------------------------------------------------------------date | asset | | | | | |-------------------------------------------------------------------| AAPL | 0.09|-0.01|-0.079| 0.5 | G1 | 3--------------------------------------------------------| BA | 0.02| 0.06| 0.020| -1.1 | G2 | 5--------------------------------------------------------2014-01-01 | CMG | 0.03| 0.09| 0.036| 1.7 | G2 | 1--------------------------------------------------------| DAL |-0.02|-0.06|-0.029| -0.1 | G3 | 5--------------------------------------------------------| LULU |-0.03| 0.05|-0.009| 2.7 | G1 | 2--------------------------------------------------------See Also--------utils.get_clean_factorFor use when forward returns are already available.forward_returns compute_forward_returns(factor,prices,periods,filter_zscore,cumulative_returns,)factor_data get_clean_factor(factor, forward_returns, groupbygroupby,groupby_labelsgroupby_labels,quantilesquantiles, binsbins,binning_by_groupbinning_by_group,max_lossmax_loss, zero_awarezero_aware)return factor_data源码
使用Akshare获取a股600519数据然后使用alphalens-reloaded进行最基本的因子分析因子使用5日均线与10日均线的交叉代码如下
import warnings
warnings.filterwarnings(ignore)
import pandas as pd
import alphalens
import seaborn as sns
import akshare as ak
from pytz import timezone
# %matplotlib inline
sns.set_style(white)
# pd.set_option(display.max_columns, None)
# pd.set_option(display.max_rows, None)# 使用 akshare 的 stock_zh_a_hist 函数
df ak.stock_zh_a_hist(symbol600519, perioddaily, start_date20200101, end_date20201231, adjustqfq)
# 调整 DataFrame 列名
df.rename(columns{
日期: date,
开盘: open,
收盘: close,
最高: high,
最低: low,
成交量: volume
}, inplaceTrue)
df[asset] 600519
# 计算开盘价和收盘价之差
# df[factor] df[close]
df[ma5] df[close].rolling(window5).mean().fillna(0)
df[ma10] df[close].rolling(window10).mean().fillna(0)
df[factor] df[ma5]-df[ma10]
df df.iloc[20:]
df.head(30)# 使用dff不影响原来的df
dff df
dff[date] pd.to_datetime(dff[date])
dff dff.set_index([date, asset])
dff.index dff.index.set_levels([dff.index.levels[0].tz_localize(UTC), dff.index.levels[1]])
factor dff[factor]# factor.head()
# print(factor)df[date] pd.to_datetime(df[date]).dt.tz_localize(UTC) # convert date column to datetime format with UTC timezone
df.set_index([date, asset], inplaceTrue)# select close column to create the prices dataframe
prices df[close].unstack(asset)
prices.head()
print(prices.index.tz)
print(factor.index.levels[0].tz)
# print(prices)# 现在对factor和prices进行对齐
# factor, prices factor.align(prices, joininner, axis0)factor_data alphalens.utils.get_clean_factor_and_forward_returns( factor,prices,groupbyNone,binning_by_groupFalse,quantiles2,binsNone,periods(1, 5, 10),filter_zscore20,groupby_labelsNone,max_loss0.35,zero_awareTrue,cumulative_returnsTrue,)
# factor_data.head()alphalens.tears.create_full_tear_sheet(factor_data,long_shortFalse)结果如图
常见错误 AttributeError: ‘Index’ object has no attribute ‘tz’ 时区问题国外的数据默认都带了时区国内的tushare、akshare需要自己把时区加上可以参考上述源码的处理。 MaxLossExceededError: max_loss (35.0%) exceeded 100.0%, consider increasing it. get_clean_factor_and_forward_returns函数默认的max_loss为35.0%自己也可以配置最开始使用默认的quantiles5会出现这个问题可以把入参quantiles改为2。该因子可分为正数和负数两类。 Inferred frequency None from passed values does not conform to passed frequency C 频率问题解决频率问题可以将数据同步一下可能是由于部分NaN值或者将factor与prices值对齐。
如有问题欢迎评论区留言或者私信。