xinference 使用命令实践记录

1. qwen-chat 模型相关的参数组合，以决定它能够怎样跑在各种推理引擎上

命令

xinference engine -e http://0.0.0.0:9997 --model-name qwen-chat

结果

2. 将 qwen-chat 跑在 VLLM 推理引擎上，但是我不知道什么样的其他参数符合这个要求。

命令:

xinference engine -e http://0.0.0.0:9997 --model-name qwen-chat --model-engine vllm

3. 加载 GGUF 格式的 qwen-chat 模型，我需要知道其余的参数组合

命令

xinference engine -e http://0.0.0.0:9997 --model-name qwen-chat -f ggufv2

4. 运行一个内置的 llama-2-chat 模型。当你需要运行一个模型时，第一次运行是要从HuggingFace 下载模型参数，一般来说需要根据模型大小下载10到30分钟不等。当下载完成后，Xinference本地会有缓存的处理，以后再运行相同的模型不需要重新下载由于国内下载不了 HuggingFace , 在启动 xinference-local 时增加变量 export HF_ENDPOINT=https://hf-mirror.com 指定国内镜像

事先查询一下

xinference engine -e http://0.0.0.0:9997 --model-name llama-2-chat --model-engine vllm

运行命令

xinference launch --model-engine vllm -u my-llama-2 -n llama-2-chat -s 13 -f pytorch

报显存错误, 显存只有24G

RuntimeError: Failed to launch model, detail:[address=0.0.0.0:44231, pid=47189] CUDA out of memory. Tried to allocate 270.00 MiB. GPU 0 has a total capacity of 23.64 GiB of which 213.69 MiB is free. Including non-PyTorch memory, this process has 23.43 GiB memory in use. 
Of the allocated memory 22.99 GiB is allocated by PyTorch, 
and 1.76 MiB is reserved by PyTorch but unallocated. 
If reserved but unallocated memory is large 
try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

解决办法在 xinf.sh启动脚本增加环境变量设置

PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：http://www.xdnf.cn/news/5100.html

如若内容造成侵权/违法违规/事实不符，请联系一条长河网进行投诉反馈，一经查实，立即删除！

xinference 使用命令实践记录

相关文章

【代码随想录day22】【C++复健】77. 组合；216.组合总和III； 17.电话号码的字母组合

选择适合你的报表工具，山海鲸报表与Tableau深度对比

unity优化webgl下的textMeshPro字体大小

Spark的学习-02

Vue前端框架

软件设计师 7日速成

基于 Vue3、Vite 和 TypeScript 实现开发环境下解决跨域问题，实现前后端数据传递

【Allure】allure装饰器函数

python验证码滑块图像识别

Linux基础-常用操作命令详讲

【RAG系列】KG-RAG 用最简单的方式将知识图谱引入RAG

编程语言越来越多，为什么C/C++还没有被现在的时代淘汰呢？

Docling：开源的文档解析工具，支持多种格式的解析和转换，可与其他 AI 工具集成

学习笔记：Spring框架源码Part.2——核心

linux守护进程与后台进程的区别

【360】基于springboot的志愿服务管理系统

【LLM Agents体验】Dify框架的安装指南

TODO Error occurred while trying to proxy:【】

Linux 文件基本属性

数据结构-归并排序笔记