enable flash mistral model for HPU device #594

kaixuanliu · 2025-04-18T13:47:21Z

This PR enables op level optimizations for Mistral type model. Currently it supports HPU device, and can get best throughput from 124 sentence/s to 133 sentences/s compared with Optimum-habana modeling, (We use Salesforce/SFR-Embedding-2_R for benchmark.)

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

kaixuanliu · 2025-04-21T07:03:25Z

@regisss @Narsil pls help review

enable flash mistral model for HPU device

0ccb86e

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enable flash mistral model for HPU device #594

enable flash mistral model for HPU device #594

kaixuanliu commented Apr 18, 2025

kaixuanliu commented Apr 21, 2025

enable flash mistral model for HPU device #594

Are you sure you want to change the base?

enable flash mistral model for HPU device #594

Conversation

kaixuanliu commented Apr 18, 2025

kaixuanliu commented Apr 21, 2025