X-AnyLabeling currently includes many built-in general-purpose models. For a detailed list, please refer to the Model Zoo.
Before enabling the AI-assisted labeling feature, you need to load a model. This can be done via the AI
icon button in the left sidebar or by using the shortcut Ctrl+A
.
Typically, when you select a model from the dropdown list, the application checks if the corresponding model files exist in the user's directory at ~/xanylabeling_data/models/${model_name}
. If they exist, the model is loaded directly. Otherwise, the application automatically downloads the files from the network to the specified directory.
Note: All built-in models are hosted by default on GitHub Releases. Therefore, you need a stable internet connection with access to GitHub; otherwise, the download might fail.
For users who fail to load models due to network issues, options include downloading the model offline and loading it manually, or modifying the model download source.
- Open the model_zoo.md file and find the configuration file corresponding to the desired model.
- Edit the configuration file, modify the model path, and optionally adjust other hyperparameters as needed.
- In the tool's interface, click Load Custom Model and select the path to the configuration file.
For details, please refer to section 7.7 Model Download Source Configuration
in user_guide.md.
Adapted Models refer to models that have already been integrated into X-AnyLabeling, requiring no custom inference code from the user. A list of adapted models can be found in the Model Zoo.
In this tutorial, we will use the YOLOv5s model as an example to detail how to load a custom adapted model.
a. Model Conversion
Suppose you have trained a model locally. First, you can convert the PyTorch
trained model to X-AnyLabeling's default ONNX
file format (optional). Specifically, execute:
python export.py --weights yolov5s.pt --include onnx
Note: The current version does not support dynamic inputs, so do not set the --dynamic
parameter.
Additionally, it is highly recommended to import the exported *.onnx
file using the Netron online tool to check the input and output node information, ensuring dimensions and other details are as expected.
b. Model Configuration
Once the onnx
file is ready, you can browse the Model Zoo file to find and copy the configuration file for the corresponding model.
Taking yolov5s.yaml as an example, let's look at its content:
type: yolov5
name: yolov5s-r20230520
provider: Ultralytics
display_name: YOLOv5s
model_path: https://github.com/CVHub520/X-AnyLabeling/releases/download/v0.1.0/yolov5s.onnx
iou_threshold: 0.45
conf_threshold: 0.25
classes:
- person
- bicycle
- car
...
Field | Description | Modifiable |
---|---|---|
type |
Model type identifier, cannot be customized. | ❌ |
name |
Index name of the model configuration file, keep default. | ❌ |
provider |
Model provider, can be modified based on actual situation. | ✔️ |
display_name |
Name shown in the model dropdown list in the UI, customizable. | ✔️ |
model_path |
Model loading path, supports relative and absolute paths. | ✔️ |
iou_threshold |
IoU threshold for Non-Maximum Suppression (NMS). | ✔️ |
conf_threshold |
Confidence threshold for NMS. | ✔️ |
classes |
List of model labels, must match the training labels. | ✔️ |
Note that not all fields apply to every model. Refer to the definition of the specific model.
For example, looking at the implementation of the YOLO base model, it offers additional optional configuration items:
Field | Description |
---|---|
filter_classes |
Specify classes used during inference. |
agnostic |
Use class-agnostic NMS. |
Here's a typical example:
type: yolov5
name: yolov5s-r20230520
provider: Ultralytics
display_name: YOLOv5s
model_path: /path/to/your/custom_yolov5s.onnx # Modified path
iou_threshold: 0.60
conf_threshold: 0.25
agnostic: True
filter_classes:
- person
- car
classes:
- person
- bicycle
- car
- ...
Specifically, only when using older versions of YOLOv5 (v5.0 and below), you need to specify the anchors
and stride
fields in the configuration file. Otherwise, do not specify these fields to avoid inference errors. Example:
type: yolov5
...
stride: 32
anchors:
- [10,13, 16,30, 33,23] # P3/8
- [30,61, 62,45, 59,119] # P4/16
- [116,90, 156,198, 373,326] # P5/32
Tip: For segmentation models, you can specify the
epsilon_factor
parameter to control the smoothness of the output contour points. The default value is0.005
.
c. Model Loading
After understanding the above, modify the model_path
field in the configuration file and optionally adjust other hyperparameters as needed.
The software currently supports both relative paths and absolute paths for model loading. When entering the model path, be mindful of escape characters.
Finally, in the model dropdown list in the top menu bar of the interface, find the ...Load Custom Model
option, and then import the prepared configuration file to complete the custom model loading process.
Unadapted Models refer to models that have not yet been integrated into X-AnyLabeling. Users need to follow the implementation steps below for integration.
Here, we use a multi-class semantic segmentation model, U-Net
, as an example. Follow these implementation steps:
a. Training and Exporting Model
Export the ONNX
model, ensuring the output node dimension is [1, C, H, W]
, where C
is the total number of classes (including the background class).
Friendly Reminder: Exporting to
ONNX
is optional. You can choose other model formats likePyTorch
,OpenVINO
, orTensorRT
based on your needs. For an example usingSegment-Anything-2
for video object tracking, refer to the Installation Guide, the configuration file definition sam2_hiera_base_video.yaml, and the corresponding implementation segment_anything_2_video.py.
b. Define Configuration File
First, create a new configuration file, e.g., unet.yaml
, in the configuration directory:
type: unet
name: unet-r20250101
display_name: U-Net (ResNet34)
provider: xxx
conf_threshold: 0.5
model_path: /path/to/best.onnx
classes:
- cat
- dog
- _background_
Where:
Field | Description |
---|---|
type |
Specifies the model type. Ensure it's unique from existing types to maintain identifier uniqueness. |
name |
Defines the model index for internal reference and management. Avoid conflicts with existing indices. |
display_name |
The model name displayed in the UI for easy identification. Ensure uniqueness. |
These three fields are mandatory. Add other fields as needed, such as provider, model path, hyperparameters, etc.
c. Add Configuration File
Next, add the above configuration file to the model management file:
...
- model_name: "unet-r20250101"
config_file: ":/unet.yaml"
...
d. Configure UI Components
This step can add UI components as needed. Simply add the model_type
to the corresponding list in the file.
e. Define Inference Service
A key step in defining the inference service is inheriting the Model base class, which allows you to implement model-specific forward inference logic.
Specifically, create a new file unet.py
in the model inference service directory. Here's an example:
import logging
import os
import cv2
import numpy as np
from PyQt5 import QtCore
from PyQt5.QtCore import QCoreApplication
from anylabeling.app_info import __preferred_device__
from anylabeling.views.labeling.shape import Shape
from anylabeling.views.labeling.utils.opencv import qt_img_to_rgb_cv_img
from .model import Model
from .types import AutoLabelingResult
from .engines.build_onnx_engine import OnnxBaseModel
class UNet(Model):
"""Semantic segmentation model using UNet"""
class Meta:
required_config_names = [
"type",
"name",
"display_name",
"model_path",
"classes",
]
widgets = ["button_run"]
output_modes = {
"polygon": QCoreApplication.translate("Model", "Polygon"),
}
default_output_mode = "polygon"
def __init__(self, model_config, on_message) -> None:
# Run the parent class's init method
super().__init__(model_config, on_message)
model_name = self.config["type"]
model_abs_path = self.get_model_abs_path(self.config, "model_path")
if not model_abs_path or not os.path.isfile(model_abs_path):
raise FileNotFoundError(
QCoreApplication.translate(
"Model",
f"Could not download or initialize {model_name} model.",
)
)
self.net = OnnxBaseModel(model_abs_path, __preferred_device__)
self.classes = self.config["classes"]
self.input_shape = self.net.get_input_shape()[-2:]
def preprocess(self, input_image):
input_h, input_w = self.input_shape
image = cv2.resize(input_image, (input_w, input_h))
image = np.transpose(image, (2, 0, 1))
image = image.astype(np.float32) / 255.0
image = (image - 0.5) / 0.5
image = np.expand_dims(image, axis=0)
return image
def postprocess(self, image, outputs):
n, c, h, w = outputs.shape
image_height, image_width = image.shape[:2]
# Obtain the category index of each pixel
# target shape: (1, h, w)
outputs = np.argmax(outputs, axis=1)
results = []
for i in range(c):
# Skip the background label
if self.classes[i] == '_background_':
continue
# Get the category index of each pixel for the first batch by adding [0].
mask = outputs[0] == i
# Rescaled to original shape
mask_resized = cv2.resize(mask.astype(np.uint8), (image_width, image_height))
# Get the contours
contours, _ = cv2.findContours(mask_resized, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
# Append the contours along with their respective class labels
results.append((self.classes[i], [np.squeeze(contour).tolist() for contour in contours]))
return results
def predict_shapes(self, image, image_path=None):
if image is None:
return []
try:
image = qt_img_to_rgb_cv_img(image, image_path)
except Exception as e: # noqa
logging.warning("Could not inference model")
logging.warning(e)
return []
blob = self.preprocess(image)
outputs = self.net.get_ort_inference(blob)
results = self.postprocess(image, outputs)
shapes = []
for item in results:
label, contours = item
for points in contours:
# Make sure to close
points += points[0]
shape = Shape(flags={})
for point in points:
shape.add_point(QtCore.QPointF(point[0], point[1]))
shape.shape_type = "polygon"
shape.closed = True
shape.fill_color = "#000000"
shape.line_color = "#000000"
shape.label = label
shape.selected = False
shapes.append(shape)
result = AutoLabelingResult(shapes, replace=True)
return result
def unload(self):
del self.net
Here:
- In the
Meta
class:required_config_names
: Specifies mandatory fields in the model config file for proper initialization.widgets
: Specifies controls (buttons, dropdowns, etc.) to display for this service. See this file for definitions.output_modes
: Specifies the output shape types supported (e.g., polygon, rectangle, rotated box).default_output_mode
: Specifies the default output shape type.
predict_shapes
andunload
are abstract methods that must be implemented. They define the inference process and resource release logic, respectively.
f. Add to Model Management
After the above steps, open the model configuration file. Add the corresponding model type field (e.g., unet
) to the _CUSTOM_MODELS
list and, if necessary, add the model name to relevant configuration sections.
Tip: If you don't know how to implement specific widgets, use the search panel, enter relevant keywords, and examine the implementation logic of available widgets.
Finally, go to the Model Manager class file. In the _load_model
method, initialize your instance as follows:
...
class ModelManager(QObject):
"""Model manager"""
def __init__(self):
...
...
def _load_model(self, model_id):
"""Load and return model info"""
if self.loaded_model_config is not None:
self.loaded_model_config["model"].unload()
self.loaded_model_config = None
self.auto_segmentation_model_unselected.emit()
model_config = copy.deepcopy(self.model_configs[model_id])
if model_config["type"] == "yolov5":
...
elif model_config["type"] == "unet":
from .unet import UNet
try:
model_config["model"] = UNet(
model_config, on_message=self.new_model_status.emit
)
self.auto_segmentation_model_unselected.emit()
except Exception as e: # noqa
self.new_model_status.emit(
self.tr(
"Error in loading model: {error_message}".format(
error_message=str(e)
)
)
)
print(
"Error in loading model: {error_message}".format(
error_message=str(e)
)
)
return
...
...
- The model type field must match the
type
field defined in the configuration file (Step b. Define Configuration File). - If the model is based on
SAM
(Segment Anything Model) interaction patterns, replaceself.auto_segmentation_model_unselected.emit()
withself.auto_segmentation_model_selected.emit()
to trigger the corresponding functionality. (Or better, use a configuration flag as shown in the example code).
This section provides specific examples of converting custom models to the ONNX format for quick integration into X-AnyLabeling.
InternImage introduces a large-scale Convolutional Neural Network (CNN) model utilizing deformable convolutions as core operators. This achieves large effective receptive fields, adaptive spatial aggregation, and reduced inductive bias, enabling the learning of stronger, more robust patterns from extensive data. It surpasses current CNNs and Vision Transformers on benchmarks.
Attribute | Value |
---|---|
Paper Title | InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions |
Publishing Units | Shanghai AI Laboratory, Tsinghua University, Nanjing University, etc. |
Publication Date | CVPR'23 |
Please refer to this tutorial.
This tutorial provides users with a method to quickly build lightweight, high-precision, and practical person attribute classification models using PaddleClas PULC (Practical Ultra-Lightweight image Classification). The model can be widely used in pedestrian analysis, tracking scenarios, etc.
Attribute | Value |
---|---|
Publishing Units | PaddlePaddle Team (Baidu) |
Please refer to this tutorial.
This tutorial provides users with a method to quickly build lightweight, high-precision, and practical vehicle attribute classification models using PaddleClas PULC. The model is suitable for vehicle recognition, road monitoring, etc.
Attribute | Value |
---|---|
Publishing Units | PaddlePaddle Team (Baidu) |
Please refer to this tutorial.
RF-DETR
is the first real-time model to exceed 60 AP on the Microsoft COCO benchmark alongside competitive performance at base sizes. It also achieves state-of-the-art performance on RF100-VL, an object detection benchmark that measures model domain adaptability to real world problems. RF-DETR is comparable speed to current real-time objection models.
Organization: Roboflow
Please refer to this tutorial.
Author: Kaixuan Hu
Please refer to this tutorial.
Attribute | Value |
---|---|
Paper Title | YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors |
Publishing Units | Institute of Information Science, Academia Sinica, Taiwan |
python export.py --weights yolov7.pt --img-size 640 --grid
Note: The
--grid
parameter must be included when running this command.
Attribute | Value |
---|---|
Paper Title | Gathering Information Helps Explain The Locality In Structured Object Detection (Preprint includes Gold-YOLO) |
Publishing Units | Huawei Noah's Ark Lab |
Publication Date | NeurIPS'23 |
# Clone the repository first
git clone https://github.com/huawei-noah/Efficient-Computing.git
cd Efficient-Computing/Detection/Gold-YOLO
# Run export for desired model weight
python deploy/ONNX/export_onnx.py --weights Gold_n_dist.pt --simplify --ort
# Or other weights: Gold_s_pre_dist.pt, Gold_m_pre_dist.pt, Gold_l_pre_dist.pt
DAMO-YOLO
is a fast and accurate object detection method developed by the TinyML team at Alibaba DAMO Academy's Data Analytics and Intelligence Lab. It achieves state-of-the-art performance by incorporating new techniques, including a Neural Architecture Search (NAS) backbone, an efficient re-parameterized Generalized-FPN (RepGFPN), a lightweight head, AlignedOTA label assignment, and distillation enhancement.
Attribute | Value |
---|---|
Paper Title | DAMO-YOLO: A Report on Real-Time Object Detection |
Publishing Units | Alibaba Group |
Publication Date | Arxiv'22 |
# Clone the repository first
git clone https://github.com/tinyvision/DAMO-YOLO.git
cd DAMO-YOLO
# Run converter for a specific config and checkpoint
python tools/converter.py -f configs/damoyolo_tinynasL25_S.py -c damoyolo_tinynasL25_S.pth --batch_size 1 --img_size 640
Real-Time Detection Transformer (RT-DETR
) is the first known real-time end-to-end object detector. RT-DETR-L achieves 53.0% AP on COCO val2017 at 114 FPS on a T4 GPU, while RT-DETR-X achieves 54.8% AP at 74 FPS, surpassing all YOLO detectors of the same scale in speed and accuracy. RT-DETR-R50 achieves 53.1% AP at 108 FPS, outperforming DINO-Deformable-DETR-R50 by 2.2% AP with about 21x faster FPS.
Attribute | Value |
---|---|
Paper Title | RT-DETR: DETRs Beat YOLOs on Real-time Object Detection |
Publishing Units | Baidu Inc. |
Publication Date | Arxiv'22 (Accepted to ICCV 2023) |
Please refer to external tutorials or the official repository for ONNX export instructions, as direct commands might vary. Example article (Chinese): https://zhuanlan.zhihu.com/p/628660998.
Hyper-YOLO is a novel object detection method that integrates hypergraph computation to capture complex high-order associations between visual features. It introduces a Hypergraph Computation-enhanced Semantic Collection and Scattering (HGC-SCS) framework, transforming visual feature maps into semantic space and constructing hypergraphs for high-order information propagation.
Attribute | Value |
---|---|
Paper Title | Hyper-YOLO: When Visual Object Detection Meets Hypergraph Computation |
Publishing Units | Tsinghua University, Xi'an Jiaotong University |
Publication Date | TAPMI'25 (Preprint available) |
Download the model, install dependencies, then modify the Hyper-YOLO/ultralytics/export.py
file (or a similar export script within that project), setting batch=1
and half=False
:
# Example modification within export_onnx.py or a similar script
# Ensure necessary imports (Path, YOLO, torch, os) are present
from pathlib import Path
from ultralytics import YOLO
import torch
import os
if __name__ == '__main__':
model_path = 'hyper-yolon-seg.pt' # Or your specific model weight file
if isinstance(model_path, (str, Path)):
model = YOLO(model_path)
# Ensure export arguments are set correctly
output_filename = model.export(
imgsz=640,
batch=1, # Set batch size to 1
format='onnx', # Specify ONNX format
int8=False,
half=False, # Set half to False
device="0", # Or "cpu"
verbose=False
)
print(f"Model exported to {output_filename}")
Then run the export script (adjust path as needed):
python3 Hyper-YOLO/ultralytics/export.py
The Segment Anything Model (SAM
) generates high-quality object masks from input prompts like points or boxes. It can produce masks for all objects in an image and was trained on a dataset of 11 million images and 1.1 billion masks. SAM demonstrates strong zero-shot performance on various segmentation tasks.
Attribute | Value |
---|---|
Paper Title | Segment Anything |
Publishing Units | Meta AI Research, FAIR |
Publication Date | ICCV'23 |
For ONNX export, refer to community exporters like https://github.com/vietanhdev/samexporter#sam-exporter or the official repository for potential tools.
EfficientViT
(underlying EfficientSAM) is a family of vision models designed for efficient high-resolution dense prediction. It uses a novel lightweight multi-scale linear attention module as its core building block, achieving global receptive fields and multi-scale learning with hardware-efficient operations. EfficientSAM adapts this for promptable segmentation.
Attribute | Value |
---|---|
Paper Title | EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense Prediction |
Publishing Units | MIT HAN Lab |
Publication Date | ICCV'23 |
For ONNX export, refer to the specific EfficientSAM project (e.g., linked via EfficientViT or search directly) or check benchmarking scripts like those mentioned in the EfficientViT repo: https://github.com/microsoft/Cream/tree/main/EfficientViT#benchmarking-with-onnxruntime (Note: Original provided link CVHub520/efficientvit
seems like a fork, official repo might differ).
SAM-Med2D
is a specialized model developed to address the challenges of applying state-of-the-art image segmentation techniques to medical images.
Attribute | Value |
---|---|
Paper Title | SAM-Med2D |
Publishing Units | OpenGVLab |
Publication Date | Arxiv'23 |
Refer to the deployment instructions in the official repository: https://github.com/OpenGVLab/SAM-Med2D#%EF%B8%8F-deploy. (Note: Original provided link CVHub520/SAM-Med2D
seems like a fork).
HQ-SAM
is an enhanced version of the Segment Anything Model (SAM) designed to improve mask prediction quality, especially for complex structures, while maintaining SAM's efficiency and zero-shot capabilities. It achieves this through an improved decoding process and additional training on a specialized dataset.
Attribute | Value |
---|---|
Paper Title | Segment Anything in High Quality |
Publishing Units | ETH Zurich, HKUST |
Publication Date | NeurIPS'23 |
Refer to the official HQ-SAM repository or potentially forks like https://github.com/CVHub520/sam-hq for ONNX export tutorials or scripts.
EdgeSAM
is an accelerated variant of the Segment Anything Model (SAM), optimized for efficient execution on edge devices with minimal performance compromise. It claims significant speedups over the original SAM and MobileSAM on edge hardware.
Attribute | Value |
---|---|
Paper Title | EdgeSAM: Prompt-In-the-Loop Distillation for On-Device Deployment of SAM |
Publishing Units | S-Lab, Nanyang Technological University; Shanghai AI Laboratory |
Publication Date | Arxiv'23 |
Refer to the official repository's export script: https://github.com/chongzhou96/EdgeSAM/blob/main/scripts/export_onnx_model.py.
Grounding DINO
is a state-of-the-art (SOTA) zero-shot object detection model excelling at detecting objects not defined during training. Its ability to adapt to new objects and scenes makes it highly versatile for real-world applications. It performs well in Referring Expression Comprehension (REC), identifying and locating specific objects or regions in images based on text descriptions. Grounding DINO simplifies object detection by eliminating hand-designed components like Non-Maximum Suppression (NMS).
Attribute | Value |
---|---|
Paper Title | Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection |
Publishing Units | IDEA-CVR, IDEA-Research |
Publication Date | Arxiv'23 |
Please refer to this tutorial.
YOLO-World
enhances the YOLO series by incorporating vision-language modeling, enabling efficient open-vocabulary object detection that excels in various tasks.
Attribute | Value |
---|---|
Paper Title | YOLO-World: Real-Time Open-Vocabulary Object Detection |
Publishing Units | Tencent AI Lab, ARC Lab, Tencent PCG, Huazhong University of Science and Technology |
Publication Date | Arxiv'24 |
# Ensure ultralytics package is installed and updated
# pip install -U ultralytics
# Clone the ultralytics repo if needed for specific export scripts, otherwise use the pip package
# git clone https://github.com/ultralytics/ultralytics.git
# cd ultralytics
# Use the yolo command line interface
yolo export model=yolov8s-worldv2.pt format=onnx opset=13 simplify
GeCo
is a unified architecture for few-shot counting, achieving high-precision object detection, segmentation, and counting through novel dense queries and a counting loss.
Attribute | Value |
---|---|
Paper Title | GeCo: Query-Based Anchors for Fine-Grained Multi-Object Counting, Detection, and Segmentation |
Publishing Units | University of Ljubljana |
Publication Date | NeurIPS'24 |
Please refer to this tutorial.
RAM
(Recognize Anything Model) is a robust image tagging model known for its exceptional image recognition capabilities. RAM excels in zero-shot generalization, is cost-effective, reproducible, and relies on open-source, annotation-free datasets. Its flexibility makes it suitable for a wide range of applications.
Attribute | Value |
---|---|
Paper Title | Recognize Anything: A Strong Image Tagging Model |
Publishing Units | OPPO Research Institute, IDEA-Research, AI Robotics |
Publication Date | Arxiv'23 |
Please refer to this tutorial. (Note: Original linked repo Tag2Text
seems related but RAM is often associated with recognize-anything
).