做电商的进货网站,大田县建设资讯网站,色流网站如何做,重庆自助建站模板概述
AutoRound#xff08;https://github.com/intel/auto-round#xff09;实现了出色的量化性能#xff0c;在W4G128上多数场景中接近无损压缩#xff0c;适用于包括gemma-7B、Mistral-7b、Mixtral-8x7B-v0.1、Mixtral-8x7B-Instruct-v0.1、Phi2、LLAMA2等一系列流行模型…概述
AutoRoundhttps://github.com/intel/auto-round实现了出色的量化性能在W4G128上多数场景中接近无损压缩适用于包括gemma-7B、Mistral-7b、Mixtral-8x7B-v0.1、Mixtral-8x7B-Instruct-v0.1、Phi2、LLAMA2等一系列流行模型。在尽量公正的评估中AutoRound在W4G128、W4G-1、W3G128、W2G128的大多数场景中优于GPTQAWQ等方法 主要特性
广泛的模型支持AutoRound可以量化多种模型族涵盖了gemma、Mistral-7b、Mixtral-8x7B-v0.1、LLAMA1、LLAMAv2、GPT、QWEN1、OPT、Bloom、Falcon、GPT-LEO、StableLM-Base-Alpha、Dolly-v2、MPT、GPT-J-6b、ChatGLM2等。导出灵活性无缝导出量化模型到ITREX [1]格式用于部署在Intel CPU上以及导出到AutoGPTQ [2]格式用于在Nvidia-GPU上运行。Tuning设备兼容性支持tuning设备扩展到Intel CPU、Intel Guadi2和Nvidia-GPU。数据集兼容性AutoRound支持与Pile10k和MBPP数据集的校准可轻松扩展以纳入其他所需的数据集。
示例
语言建模模型量化示例。代码生成模型量化示例。
其他
已量化模型已在Hugging Face发布几个预先量化的模型由于公司内部审核有些模型待发布大量得准确性数据。
精度数据示例 gemma-7b Install lm-eval-harness from source, and the git id 96d185fa6232a5ab685ba7c43e45d1dbb3bb906d, Install the latest AutoGPTQ from source first
lm_eval --model hf --model_args pretrainedIntel/gemma-7b-int4-inc,autogptqTrue,gptq_use_tritonTrue --device cuda:0 --tasks lambada_openai,hellaswag,piqa,winogrande,truthfulqa_mc1,openbookqa,boolq,rte,arc_easy,arc_challenge,mmlu --batch_size 32MetricFP16int4Avg.0.62390.6307mmlu0.61620.6147lambada_openai0.67510.7204hellaswag0.60470.5903winogrande0.73240.7514piqa0.79430.7949truthfulqa_mc10.30970.3011openbookqa0.33200.3400boolq0.82780.8269rte0.65340.7076arc_easy0.81780.7959arc_challenge0.49910.4940
Mixtral-8x7B-Instruct
MetricFP16INT4Avg.0.70000.6977mmlu0.68850.6824lambada_openai0.77180.7790hellaswag0.67670.6745winogrande0.76870.7719piqa0.83510.8335truthfulqa_mc10.49690.4884openbookqa0.36800.3720boolq0.88500.8783rte0.71840.7004arc_easy0.86990.8712arc_challenge0.62200.6229
phi-2 Since we encountered an issue evaluating this model with lm-eval, we opted to evaluate the qdq model instead. In our assessment, we found that its accuracy closely matches that of the real quantized model in most cases except for some small models like opt-125m.
MetricFP16INT4 qdqAvg.0.61550.6163mmlu0.54480.5417lambada_openai0.62680.6225hellaswag0.55850.5498winogrande0.75300.7545piqa0.78670.7824truthfulqa_mc10.31330.3060openbookqa0.40000.4100boolq0.83390.8327rte0.62450.6643arc_easy0.79970.7955arc_challenge0.52900.5196
参考
[1] Intel Extension for Transformers
[2] AutoGPTQ