昇腾910B部署Qwen2-7B-Instruct进行流式输出【pytorch框架】NPU推理

目录

  • 前情提要
    • torch_npu框架
    • mindsport框架
    • mindnlp框架
  • 下载模型
    • 国外
    • 国内
  • 环境设置
  • 代码适配(非流式)
    • Main
    • Branch
    • 结果展示
  • 代码适配(流式)

前情提要

torch_npu框架

官方未适配
在这里插入图片描述

mindsport框架

官方未适配
在这里插入图片描述

mindnlp框架

官方适配了,但是速度非常非常慢,10秒一个字
在这里插入图片描述

下载模型

国外

Hugging FaceHugging Face

国内

在这里插入图片描述modelscope

环境设置

pip install transformers==4.39.2
pip3 install torch==2.1.0
pip3 install torch-npu==2.1.0.post4
pip3 install accelerate==0.24.1
pip3 install transformers-stream-generator==0.0.5

代码适配(非流式)

Main

import torch
import torch_npu
import os
import platform
torch_device = "npu:1" # 0~7
torch.npu.set_device(torch.device(torch_device))
torch.npu.set_compile_mode(jit_compile=False)
option = {}
option["NPU_FUZZY_COMPILE_BLACKLIST"] = "Tril"
torch.npu.set_option(option)
from transformers import AutoModelForCausalLM, AutoTokenizer
# device = "cuda" # the device to load the model onto
DEFAULT_CKPT_PATH = '/root/.cache/modelscope/hub/qwen/Qwen2-7B-Instruct'
model = AutoModelForCausalLM.from_pretrained(DEFAULT_CKPT_PATH,torch_dtype=torch.float16,device_map=torch_device
).npu().eval()
tokenizer = AutoTokenizer.from_pretrained(DEFAULT_CKPT_PATH)
while True:prompt = input("user:")if prompt == "exit":breakmessages = [{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": prompt}]text = tokenizer.apply_chat_template(messages,tokenize=False,add_generation_prompt=True)model_inputs = tokenizer([text], return_tensors="pt").to(torch_device)generated_ids = model.generate(model_inputs.input_ids,max_new_tokens=512)generated_ids = [output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)]response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]print("Qwen2-7B-Instruct:",response)

Branch

找到自己虚拟环境

which python

我的是/root/anaconda3/envs/sakura/bin/python
找到/lib/python3.9/site-packages/transformers/generation/utils.py示例:

/root/anaconda3/envs/sakura/lib/python3.9/site-packages/transformers/generation/utils.py

找到第2708行,注释掉2708行~2712行
在2709行添加

next_token_scores = outputs.logits[:, -1, :]

示例:
在这里插入图片描述
出错就是在这里,如果进行了pre-process distribution,就会报错

/root/anaconda3/envs/sakura/lib/python3.9/site-packages/transformers/generation/logits_process.py:455: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at build/CMakeFiles/torch_npu.dir/compiler_depend.ts:74.)sorted_indices_to_remove[..., -self.min_tokens_to_keep :] = 0
Traceback (most recent call last):File "/root/Qwen_test.py", line 63, in <module>generated_ids = model.generate(File "/root/anaconda3/envs/sakura/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_contextreturn func(*args, **kwargs)File "/root/anaconda3/envs/sakura/lib/python3.9/site-packages/transformers/generation/utils.py", line 1576, in generateresult = self._sample(File "/root/anaconda3/envs/sakura/lib/python3.9/site-packages/transformers/generation/utils.py", line 2736, in _samplenext_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: Sync:build/CMakeFiles/torch_npu.dir/compiler_depend.ts:158 NPU error, error code is 507018
[Error]: The aicpu execution is abnormal.Rectify the fault based on the error information in the ascend log.
E39999: Inner Error!
E39999: 2024-07-02-14:14:50.735.070  An exception occurred during AICPU execution, stream_id:23, task_id:2750, errcode:21008, msg:inner error[FUNC:ProcessAicpuErrorInfo][FILE:device_error_proc.cc][LINE:730]TraceBack (most recent call last):rtStreamSynchronizeWithTimeout execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]synchronize stream failed, runtime result = 507018[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]DEVICE[1] PID[864803]:
EXCEPTION TASK:Exception info:TGID=864803, model id=65535, stream id=23, stream phase=SCHEDULE, task id=2750, task type=aicpu kernel, recently received task id=2750, recently send task id=2749, task phase=RUNMessage info[0]:aicpu=0,slot_id=0,report_mailbox_flag=0x5a5a5a5a,state=0x5210Other info[0]:time=2024-07-02-14:14:50.091.974, function=proc_aicpu_task_done, line=970, error code=0x2a
[W compiler_depend.ts:368] Warning: NPU warning, error code is 507018[Error]:
[Error]: The aicpu execution is abnormal.Rectify the fault based on the error information in the ascend log.
EZ9999: Inner Error!
EZ9999: 2024-07-02-14:14:50.743.702  Kernel task happen error, retCode=0x2a, [aicpu exception].[FUNC:PreCheckTaskErr][FILE:task_info.cc][LINE:1776]TraceBack (most recent call last):Aicpu kernel execute failed, device_id=1, stream_id=23, task_id=2750, errorCode=2a.[FUNC:PrintAicpuErrorInfo][FILE:task_info.cc][LINE:1579]Aicpu kernel execute failed, device_id=1, stream_id=23, task_id=2750, fault op_name=[FUNC:GetError][FILE:stream.cc][LINE:1512]rtDeviceSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]wait for compute device to finish failed, runtime result = 507018.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161](function npuSynchronizeDevice)
[W compiler_depend.ts:368] Warning: NPU warning, error code is 507018[Error]:
[Error]: The aicpu execution is abnormal.Rectify the fault based on the error information in the ascend log.
EH9999: Inner Error!rtDeviceSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
EH9999: 2024-07-02-14:14:50.745.695  wait for compute device to finish failed, runtime result = 507018.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]TraceBack (most recent call last):(function npuSynchronizeDevice)
[W compiler_depend.ts:368] Warning: NPU warning, error code is 507018[Error]:
[Error]: The aicpu execution is abnormal.Rectify the fault based on the error information in the ascend log.
EH9999: Inner Error!rtDeviceSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
EH9999: 2024-07-02-14:14:50.747.300  wait for compute device to finish failed, runtime result = 507018.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]TraceBack (most recent call last):(function npuSynchronizeDevice)
[W compiler_depend.ts:368] Warning: NPU warning, error code is 507018[Error]:
[Error]: The aicpu execution is abnormal.Rectify the fault based on the error information in the ascend log.
EH9999: Inner Error!rtDeviceSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
EH9999: 2024-07-02-14:14:50.814.377  wait for compute device to finish failed, runtime result = 507018.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]TraceBack (most recent call last):(function npuSynchronizeDevice)
[W compiler_depend.ts:368] Warning: NPU warning, error code is 507018[Error]:
[Error]: The aicpu execution is abnormal.Rectify the fault based on the error information in the ascend log.
EH9999: Inner Error!rtDeviceSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
EH9999: 2024-07-02-14:14:50.816.023  wait for compute device to finish failed, runtime result = 507018.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]TraceBack (most recent call last):(function npuSynchronizeDevice)
[W compiler_depend.ts:368] Warning: NPU warning, error code is 507018[Error]:
[Error]: The aicpu execution is abnormal.Rectify the fault based on the error information in the ascend log.
EH9999: Inner Error!rtDeviceSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
EH9999: 2024-07-02-14:14:50.817.628  wait for compute device to finish failed, runtime result = 507018.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]TraceBack (most recent call last):(function npuSynchronizeDevice)
[W compiler_depend.ts:368] Warning: NPU warning, error code is 507018[Error]:
[Error]: The aicpu execution is abnormal.Rectify the fault based on the error information in the ascend log.
EH9999: Inner Error!rtDeviceSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
EH9999: 2024-07-02-14:14:50.819.236  wait for compute device to finish failed, runtime result = 507018.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]TraceBack (most recent call last):(function npuSynchronizeDevice)
[W compiler_depend.ts:368] Warning: NPU warning, error code is 507018[Error]:
[Error]: The aicpu execution is abnormal.Rectify the fault based on the error information in the ascend log.
EH9999: Inner Error!rtDeviceSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
EH9999: 2024-07-02-14:14:50.820.843  wait for compute device to finish failed, runtime result = 507018.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]TraceBack (most recent call last):(function npuSynchronizeDevice)
[W compiler_depend.ts:368] Warning: NPU warning, error code is 507018[Error]:
[Error]: The aicpu execution is abnormal.Rectify the fault based on the error information in the ascend log.
EH9999: Inner Error!rtDeviceSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
EH9999: 2024-07-02-14:14:50.822.422  wait for compute device to finish failed, runtime result = 507018.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]TraceBack (most recent call last):(function npuSynchronizeDevice)

结果展示

最后运行Main文件
在这里插入图片描述

代码适配(流式)

未完待续

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.xdnf.cn/news/1473617.html

如若内容造成侵权/违法违规/事实不符,请联系一条长河网进行投诉反馈,一经查实,立即删除!

相关文章

25.【C语言】循环结构之for 上

1.基本使用 类比while 在while循环中&#xff0c;有三个不可或缺的部分&#xff1a;初始化&#xff0c;判断部分&#xff0c;调整部分 int i 0;//初始化 while (i < 10)//判断部分 {……i;//调整部分 }三个部分太分散&#xff0c;用for循环可集为一体&#xff0c;简洁 …

如何使用uer做多分类任务

如何使用uer做多分类任务 语料集下载 找到这里点击即可 里面是这有json文件的 因此我们对此要做一些处理&#xff0c;将其转为tsv格式 # -*- coding: utf-8 -*- import json import csv import chardet# 检测文件编码 def detect_encoding(file_path):with open(file_path,…

使用flask的web网页部署介绍

使用flask的web网页部署介绍 文章目录 前言一、网页介绍二、数据库设计介绍总结 前言 flaskbootstrapjquerymysql搭建三叶青在线识别网站&#xff0c;使用nginxgunicorn将网站部署在腾讯云上&#xff0c;配置SSL证书。网站地址&#xff1a;https://www.whtuu.cn 三叶青图像识…

Android增量更新----java版

一、背景 开发过程中&#xff0c;随着apk包越来越大&#xff0c;全量更新会使得耗时&#xff0c;同时浪费流量&#xff0c;为了节省时间&#xff0c;使用增量更新解决。网上很多文章都不是很清楚&#xff0c;没有手把手教学&#xff0c;使得很多初学者&#xff0c;摸不着头脑&a…

爬虫笔记20——票星球抢票脚本的实现

以下内容仅供交流学习使用&#xff01;&#xff01;&#xff01; 思路分析 前面的爬虫笔记一步一步走过来我们的技术水平也有了较大的提升了&#xff0c;现在我们来进行一下票星球抢票实战项目&#xff0c;实现票星球的自动抢票。 我们打开票星球的移动端页面&#xff0c;分…

KDTree 简单原理与实现

介绍 K-D树是一种二叉树的数据结构&#xff0c;其中每个节点代表一个k维点&#xff0c;可用于组织K维空间中的点&#xff0c;其中K通常是一个非常大的数字。二叉树结构允许对多维空间中的点进行非常有效的搜索&#xff0c;包括最近邻搜索和范围搜索&#xff0c;树中的每个非叶…

Newport太阳光模拟器MSOL-UV-X使用说明手侧

Newport太阳光模拟器MSOL-UV-X使用说明手侧

死锁-活锁与活锁的预防、死锁与死锁的预防和检测(处理死锁的方式:事务等待图)

一、引言 1、死锁是因采用封锁技术实现并发控制而产生的一种运行事务被阻塞或等待的现象 2、如果利用严格两阶段封锁协议来解决我们前面提到的“更新丢失”这种数据不一致问题&#xff0c;非串行调度中的事务T1首先获得数据对象X上的读锁并开始执行&#xff0c;随后事务T2也获…

算法库应用--Brute - Force算法串匹配(顺序串)

学习贺利坚老师关于B-F算法的算法库 数据结构例程——串的模式匹配&#xff08;Brute-Force算法&#xff09;_sqstring s, t; strassign(s,"ababcabcacbabcaccab");-CSDN博客 本人规则解析博客 串的匹配 (Brute - Force 算法)_brute force算法-CSDN博客\ 版本更新日志…

在5G/6G应用中实现高性能放大器的建模挑战

来源&#xff1a;Modelling Challenges for Enabling High Performance Amplifiers in 5G/6G Applications {第28届“集成电路和系统的混合设计”(Mixed Design of Integrated Circuits and Systems)国际会议论文集&#xff0c;2021年6月24日至26日&#xff0c;波兰洛迪} 本文讨…

跟着峰哥学java 第四天 商品分类 前后端显示

1.后端 1.1mybatis-plus分页查询配置 在商品热卖数据中&#xff0c;只让其显示八条数据 将要使用分页 也就是service.page方法 此时需要配置 mp拦截器 Configuration public class MybatisPlusConfig {Beanpublic PaginationInterceptor paginationInterceptor() {return …

宝可梦 第一到第五时代 神兽 幻兽 准神宝可梦盘点

小时候特别喜欢看宝可梦 也玩过一些宝可梦类游戏 而宝可梦中 大家最喜欢的莫过于神兽 今天 我们来盘点一下 宝可梦各世代的神兽 以及准神宝可梦 第一世代 一级神 超梦 属性: 超能力 是火箭队根据梦幻基因制造的一只人造传说宝可梦。 一直是一只热度非常高的宝可梦&#xf…

图书管理系统 全栈项目分享

文章目录 项目简要说明项目开源地址b站视频演示技术栈部分效果展示 项目简要说明 本项目是我的数据库课设&#xff0c;个人感觉做得还行&#xff0c;目前项目开源&#xff0c;README文档里有项目的介绍和使用说明&#xff0c;这里就不一一赘述了 项目开源地址 github - libr…

MobaXterm不显示隐藏文件

MobaXterm在左边显示隐藏文件&#xff0c;以.开头的文件&#xff0c;想让它不显示&#xff0c;点击红框按钮就可以了

Ubuntu 20版本安装Redis教程

第一步 切换到root用户&#xff0c;使用su命令&#xff0c;进行切换。 输入&#xff1a; su - 第二步 使用apt命令来搜索redis的软件包&#xff0c;输入命令&#xff1a;apt search redis 第三步 选择需要的redis版本进行安装&#xff0c;本次选择默认版本&#xff0c;redis5.…

嵌入式C语言面试相关知识——关键字(不定期更新)

嵌入式C语言面试相关知识——关键字 一、博客声明二、C语言关键字1、sizeof关键字2、static关键字3、const关键字4、volatile关键字5、extern关键字 一、博客声明 又是一年一度的秋招&#xff0c;怎么能只刷笔试题目呢&#xff0c;面试题目也得看&#xff0c;想当好厂的牛马其实…

golang结合neo4j实现权限功能设计

neo4j 是非关系型数据库之图形数据库&#xff0c;这里不再赘述。 传统关系数据库基于rbac实现权限, user ---- role ------permission,加上中间表共5张表。 如果再添上部门的概念&#xff1a;用户属于部门&#xff0c;部门拥有 角色&#xff0c;则又多了一层&#xff1a; user-…

小暑节气,选对劳保鞋,让安全与清凉同行

在七月炽热的阳光下&#xff0c;我们迎来了二十四节气中的小暑&#xff0c;标志着盛夏时节的正式开始。随着气温的节节攀升&#xff0c;不仅大自然万物进入了生长的旺季&#xff0c;我们的工作与日常生活也面临着新的挑战——如何在高温环境下保障自身安全&#xff0c;成为了不…

计算机网络——数据链路层(以太网)

目录 局域网的数据链路层 局域网可按照网络拓扑分类 局域网与共享信道 以太网的两个主要标准 适配器与mac地址 适配器的组成与运作 MAC地址 MAC地址的详细介绍 局域网的mac地址格式 mac地址的发送顺序 单播、多播&#xff0c;广播mac地址 mac帧 如何取用…

Spring源码十四:Spring生命周期

上一篇我们在Spring源码十三&#xff1a;非懒加载单例Bean中看到了Spring会在refresh方法中去调用我们的finishBeanFactoryInitialization方法去实例化&#xff0c;所有非懒加载器单例的bean。并实例化后的实例放到单例缓存中。到此我们refresh方法已经接近尾声。 Spring的生命…