做网站放广告收益,sem代运营托管公司,目前建设网站,微信scrmusearch的简单使用
usearch是快速开源搜索和聚类引擎#xff0c;用于C、C、Python、JavaScript、Rust、Java、Objective-C、Swift、C#、GoLang和Wolfram #x1f50d;中的向量和#x1f51c;字符串
// https://github.com/unum-cloud/usearch/blob/main/python/README.md
…usearch的简单使用
usearch是快速开源搜索和聚类引擎×用于C、C、Python、JavaScript、Rust、Java、Objective-C、Swift、C#、GoLang和Wolfram 中的向量和字符串×
// https://github.com/unum-cloud/usearch/blob/main/python/README.md
$ pip install usearch
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Collecting usearchDownloading https://pypi.tuna.tsinghua.edu.cn/packages/ba/f4/24124f65ea3e940e54af29d55204ddfbeafa86d6b94b63c2e99baff2f7d6/usearch-2.8.14-cp38-cp38-manylinux_2_28_x86_64.whl (1.5 MB)━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.5/1.5 MB 17.0 MB/s eta 0:00:00
Requirement already satisfied: numpy in /home/ubuntu/anaconda3/envs/vglm2/lib/python3.8/site-packages (from usearch) (1.23.1)
Requirement already satisfied: tqdm in /home/ubuntu/anaconda3/envs/vglm2/lib/python3.8/site-packages (from usearch) (4.66.1)
Installing collected packages: usearch
Successfully installed usearch-2.8.14一个简单的例子注本例子在运行时向index中不断添加项目并将最后的index持久化为一个文件在运行时由于添加项目内存占用会不断增加
import numpy as np
from usearch.index import Index, MetricKind, Matchesndim 131072
index_path test.usearchindex Index(ndimndim, # Define the number of dimensions in input vectorsmetriccos, # Choose l2sq, haversine or other metric, default ipdtypef32, # Quantize to f16 or i8 if needed, default f32connectivity16, # How frequent should the connections in the graph be, optionalexpansion_add128, # Control the recall of indexing, optionalexpansion_search64, # Control the quality of search, optional
)# index Index(ndimndim, metricMetricKind.Cos)for i in range(1,10):vector np.random.random((1000, ndim)).astype(float32)index.add(None, vector, logTrue)index.save(index_path)
vector np.random.random((1, ndim)).astype(float32)
matches: Matches index.search(vector, 10)
ids matches.keys.flatten()print(matches)# test.usearch大小 10*1000*131072 2.2G (如果dtypef324G)usearch-images
https://github.com/ashvardanian/usearch-images
运行效果 数据获取
https://huggingface.co/datasets/unum-cloud/ann-unsplash-25k/tree/main 依赖 ucall
Requires: Python 3.9https://pypi.org/project/ucall/#files
OSError: [Errno 28] inotify watch limit reached File /home/ubuntu/anaconda3/envs/usearch/lib/python3.10/site-packages/watchdog/observers/inotify_c.py, line 428, in _raise_errorraise OSError(errno.ENOSPC, inotify watch limit reached)
OSError: [Errno 28] inotify watch limit reached这个错误表明在使用 watchdog 库时超过了 Linux 系统对 inotify 监视的文件数或目录数的限制。Linux 对于每个进程的 inotify 能够监视的文件和目录有一个限制当达到这个限制时会出现像上面的错误一样的问题。可以尝试增加系统对 inotify 的资源限制。可以通过修改 /etc/sysctl.conf 文件来增加 fs.inotify.max_user_watches 参数的值。例如
bash
sudo sysctl -w fs.inotify.max_user_watches65536
然后运行下面的命令使更改生效bash
sudo sysctl -p注意增加监视数可能会对系统资源产生一些影响因此请根据实际情况慎重调整。
模型加载 https://huggingface.co/unum-cloud/uform-vl-multilingual-v2/tree/main
///home/ubuntu/anaconda3/envs/usearch/lib/python3.10/site-packages/uform/__init__.py
def get_checkpoint(model_name, token) - Tuple[str, Mapping, str]:model_path snapshot_download(repo_idmodel_name, tokentoken)config_path f{model_path}/torch_config.jsonstate torch.load(f{model_path}/torch_weight.pt)return config_path, state, f{model_path}/tokenizer.jsondef get_model(model_name: str, token: Optional[str] None) - VLM:config_path, state, tokenizer_path get_checkpoint(model_name, token)with open(config_path, r) as f:model VLM(load(f), tokenizer_path)model.image_encoder.load_state_dict(state[image_encoder])model.text_encoder.load_state_dict(state[text_encoder])return model.eval()修改成如下调用时使用_model get_model(你的下载路径)
def get_checkpoint(model_name, token) - Tuple[str, Mapping, str]:model_path model_name#snapshot_download(repo_idmodel_name, tokentoken)config_path f{model_path}/torch_config.jsonstate torch.load(f{model_path}/torch_weight.pt)return config_path, state, f{model_path}/tokenizer.jsondef get_model(model_name: str, token: Optional[str] None) - VLM:config_path, state, tokenizer_path get_checkpoint(model_name, token)with open(config_path, r) as f:model VLM(load(f), tokenizer_path)model.image_encoder.load_state_dict(state[image_encoder])model.text_encoder.load_state_dict(state[text_encoder])return model.eval()其他细微的修改
数据源的修改
_datasets {name: _open_dataset(os.path.join(/home/ubuntu/userfile/***/Usearch/usearch-images-main/data, name))for name in (unsplash-25k,# cc-3m,# laion-4m,)
}dataset_names: str st.multiselect(Datasets,[dataset_unsplash_name,# dataset_cc_name,# dataset_laion_name,],[dataset_unsplash_name],#, dataset_cc_name],format_funclambda x: x.split(:)[0],
)也可下载cc-3m数据
数据读取的修改 # uris: Strs File(os.path.join(dir, images.txt)).splitlines()file_path os.path.join(dir, images.txt)with open(file_path, r) as file:uris file.read().splitlines()CG
“usearch” 通常指的是一个生物信息学工具用于对DNA和蛋白质序列进行搜索和比对。具体来说它是由Qiime软件包提供的一个用于序列分析的工具主要用于对微生物群落的高通量测序数据进行处理和分析。QiimeQuantitative Insights Into Microbial Ecology是一个用于分析和解释微生物群落结构的开源软件包。在Qiime中usearch被用于处理和比对DNA序列以便进行物种注释、多样性分析等。USEARCH —— 最简单易学的扩增子分析流程