Skip to content

Latest commit

 

History

History
657 lines (485 loc) · 35 KB

custom_model.md

File metadata and controls

657 lines (485 loc) · 35 KB

Model Loading

X-AnyLabeling currently includes many built-in general-purpose models. For a detailed list, please refer to the Model Zoo.

Loading Built-in Models

Before enabling the AI-assisted labeling feature, you need to load a model. This can be done via the AI icon button in the left sidebar or by using the shortcut Ctrl+A.

Typically, when you select a model from the dropdown list, the application checks if the corresponding model files exist in the user's directory at ~/xanylabeling_data/models/${model_name}. If they exist, the model is loaded directly. Otherwise, the application automatically downloads the files from the network to the specified directory.

Note: All built-in models are hosted by default on GitHub Releases. Therefore, you need a stable internet connection with access to GitHub; otherwise, the download might fail.

For users who fail to load models due to network issues, options include downloading the model offline and loading it manually, or modifying the model download source.

Offline Model Download

  • Open the model_zoo.md file and find the configuration file corresponding to the desired model.
  • Edit the configuration file, modify the model path, and optionally adjust other hyperparameters as needed.
  • In the tool's interface, click Load Custom Model and select the path to the configuration file.

Modify Model Download Source

For details, please refer to section 7.7 Model Download Source Configuration in user_guide.md.

Loading Adapted User Custom Models

Adapted Models refer to models that have already been integrated into X-AnyLabeling, requiring no custom inference code from the user. A list of adapted models can be found in the Model Zoo.

In this tutorial, we will use the YOLOv5s model as an example to detail how to load a custom adapted model.

a. Model Conversion

Suppose you have trained a model locally. First, you can convert the PyTorch trained model to X-AnyLabeling's default ONNX file format (optional). Specifically, execute:

python export.py --weights yolov5s.pt --include onnx

Note: The current version does not support dynamic inputs, so do not set the --dynamic parameter.

Additionally, it is highly recommended to import the exported *.onnx file using the Netron online tool to check the input and output node information, ensuring dimensions and other details are as expected.

Netron

b. Model Configuration

Once the onnx file is ready, you can browse the Model Zoo file to find and copy the configuration file for the corresponding model.

Taking yolov5s.yaml as an example, let's look at its content:

type: yolov5
name: yolov5s-r20230520
provider: Ultralytics
display_name: YOLOv5s
model_path: https://github.com/CVHub520/X-AnyLabeling/releases/download/v0.1.0/yolov5s.onnx
iou_threshold: 0.45
conf_threshold: 0.25
classes:
  - person
  - bicycle
  - car
  ...
Field Description Modifiable
type Model type identifier, cannot be customized.
name Index name of the model configuration file, keep default.
provider Model provider, can be modified based on actual situation. ✔️
display_name Name shown in the model dropdown list in the UI, customizable. ✔️
model_path Model loading path, supports relative and absolute paths. ✔️
iou_threshold IoU threshold for Non-Maximum Suppression (NMS). ✔️
conf_threshold Confidence threshold for NMS. ✔️
classes List of model labels, must match the training labels. ✔️

Note that not all fields apply to every model. Refer to the definition of the specific model.

For example, looking at the implementation of the YOLO base model, it offers additional optional configuration items:

Field Description
filter_classes Specify classes used during inference.
agnostic Use class-agnostic NMS.

Here's a typical example:

type: yolov5
name: yolov5s-r20230520
provider: Ultralytics
display_name: YOLOv5s
model_path: /path/to/your/custom_yolov5s.onnx # Modified path
iou_threshold: 0.60
conf_threshold: 0.25
agnostic: True
filter_classes:
  - person
  - car
classes:
  - person
  - bicycle
  - car
  - ...

Specifically, only when using older versions of YOLOv5 (v5.0 and below), you need to specify the anchors and stride fields in the configuration file. Otherwise, do not specify these fields to avoid inference errors. Example:

type: yolov5
...
stride: 32
anchors:
  - [10,13, 16,30, 33,23]  # P3/8
  - [30,61, 62,45, 59,119]  # P4/16
  - [116,90, 156,198, 373,326]  # P5/32

Tip: For segmentation models, you can specify the epsilon_factor parameter to control the smoothness of the output contour points. The default value is 0.005.

c. Model Loading

After understanding the above, modify the model_path field in the configuration file and optionally adjust other hyperparameters as needed.

The software currently supports both relative paths and absolute paths for model loading. When entering the model path, be mindful of escape characters.

Finally, in the model dropdown list in the top menu bar of the interface, find the ...Load Custom Model option, and then import the prepared configuration file to complete the custom model loading process.

Loading Unadapted User Custom Models

Unadapted Models refer to models that have not yet been integrated into X-AnyLabeling. Users need to follow the implementation steps below for integration.

Here, we use a multi-class semantic segmentation model, U-Net, as an example. Follow these implementation steps:

a. Training and Exporting Model

Export the ONNX model, ensuring the output node dimension is [1, C, H, W], where C is the total number of classes (including the background class).

Friendly Reminder: Exporting to ONNX is optional. You can choose other model formats like PyTorch, OpenVINO, or TensorRT based on your needs. For an example using Segment-Anything-2 for video object tracking, refer to the Installation Guide, the configuration file definition sam2_hiera_base_video.yaml, and the corresponding implementation segment_anything_2_video.py.

b. Define Configuration File

First, create a new configuration file, e.g., unet.yaml, in the configuration directory:

type: unet
name: unet-r20250101
display_name: U-Net (ResNet34)
provider: xxx
conf_threshold: 0.5
model_path: /path/to/best.onnx
classes:
  - cat
  - dog
  - _background_

Where:

Field Description
type Specifies the model type. Ensure it's unique from existing types to maintain identifier uniqueness.
name Defines the model index for internal reference and management. Avoid conflicts with existing indices.
display_name The model name displayed in the UI for easy identification. Ensure uniqueness.

These three fields are mandatory. Add other fields as needed, such as provider, model path, hyperparameters, etc.

c. Add Configuration File

Next, add the above configuration file to the model management file:

...

- model_name: "unet-r20250101"
  config_file: ":/unet.yaml"
...

d. Configure UI Components

This step can add UI components as needed. Simply add the model_type to the corresponding list in the file.

e. Define Inference Service

A key step in defining the inference service is inheriting the Model base class, which allows you to implement model-specific forward inference logic.

Specifically, create a new file unet.py in the model inference service directory. Here's an example:

import logging
import os

import cv2
import numpy as np
from PyQt5 import QtCore
from PyQt5.QtCore import QCoreApplication

from anylabeling.app_info import __preferred_device__
from anylabeling.views.labeling.shape import Shape
from anylabeling.views.labeling.utils.opencv import qt_img_to_rgb_cv_img
from .model import Model
from .types import AutoLabelingResult
from .engines.build_onnx_engine import OnnxBaseModel


class UNet(Model):
    """Semantic segmentation model using UNet"""

    class Meta:
        required_config_names = [
            "type",
            "name",
            "display_name",
            "model_path",
            "classes",
        ]
        widgets = ["button_run"]
        output_modes = {
            "polygon": QCoreApplication.translate("Model", "Polygon"),
        }
        default_output_mode = "polygon"

    def __init__(self, model_config, on_message) -> None:
        # Run the parent class's init method
        super().__init__(model_config, on_message)
        model_name = self.config["type"]
        model_abs_path = self.get_model_abs_path(self.config, "model_path")
        if not model_abs_path or not os.path.isfile(model_abs_path):
            raise FileNotFoundError(
                QCoreApplication.translate(
                    "Model",
                    f"Could not download or initialize {model_name} model.",
                )
            )
        self.net = OnnxBaseModel(model_abs_path, __preferred_device__)
        self.classes = self.config["classes"]
        self.input_shape = self.net.get_input_shape()[-2:]

    def preprocess(self, input_image):
        input_h, input_w = self.input_shape
        image = cv2.resize(input_image, (input_w, input_h))
        image = np.transpose(image, (2, 0, 1))
        image = image.astype(np.float32) / 255.0
        image = (image - 0.5) / 0.5
        image = np.expand_dims(image, axis=0)
        return image

    def postprocess(self, image, outputs):
        n, c, h, w = outputs.shape
        image_height, image_width = image.shape[:2]
        # Obtain the category index of each pixel
        # target shape: (1, h, w)
        outputs = np.argmax(outputs, axis=1)
        results = []
        for i in range(c):
            # Skip the background label
            if self.classes[i] == '_background_':
                continue
            # Get the category index of each pixel for the first batch by adding [0].
            mask = outputs[0] == i
            # Rescaled to original shape
            mask_resized = cv2.resize(mask.astype(np.uint8), (image_width, image_height))
            # Get the contours
            contours, _ = cv2.findContours(mask_resized, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
            # Append the contours along with their respective class labels
            results.append((self.classes[i], [np.squeeze(contour).tolist() for contour in contours]))
        return results

    def predict_shapes(self, image, image_path=None):
        if image is None:
            return []

        try:
            image = qt_img_to_rgb_cv_img(image, image_path)
        except Exception as e:  # noqa
            logging.warning("Could not inference model")
            logging.warning(e)
            return []

        blob = self.preprocess(image)
        outputs = self.net.get_ort_inference(blob)
        results = self.postprocess(image, outputs)
        shapes = []
        for item in results:
            label, contours = item
            for points in contours:
                # Make sure to close
                points += points[0]
                shape = Shape(flags={})
                for point in points:
                    shape.add_point(QtCore.QPointF(point[0], point[1]))
                shape.shape_type = "polygon"
                shape.closed = True
                shape.fill_color = "#000000"
                shape.line_color = "#000000"
                shape.label = label
                shape.selected = False
                shapes.append(shape)

        result = AutoLabelingResult(shapes, replace=True)
        return result

    def unload(self):
        del self.net

Here:

  • In the Meta class:
    • required_config_names: Specifies mandatory fields in the model config file for proper initialization.
    • widgets: Specifies controls (buttons, dropdowns, etc.) to display for this service. See this file for definitions.
    • output_modes: Specifies the output shape types supported (e.g., polygon, rectangle, rotated box).
    • default_output_mode: Specifies the default output shape type.
  • predict_shapes and unload are abstract methods that must be implemented. They define the inference process and resource release logic, respectively.

f. Add to Model Management

After the above steps, open the model configuration file. Add the corresponding model type field (e.g., unet) to the _CUSTOM_MODELS list and, if necessary, add the model name to relevant configuration sections.

Tip: If you don't know how to implement specific widgets, use the search panel, enter relevant keywords, and examine the implementation logic of available widgets.

Finally, go to the Model Manager class file. In the _load_model method, initialize your instance as follows:

...

class ModelManager(QObject):
    """Model manager"""

    def __init__(self):
        ...
    ...
    def _load_model(self, model_id):
        """Load and return model info"""
        if self.loaded_model_config is not None:
            self.loaded_model_config["model"].unload()
            self.loaded_model_config = None
            self.auto_segmentation_model_unselected.emit()

        model_config = copy.deepcopy(self.model_configs[model_id])
        if model_config["type"] == "yolov5":
            ...
        elif model_config["type"] == "unet":
            from .unet import UNet

            try:
                model_config["model"] = UNet(
                    model_config, on_message=self.new_model_status.emit
                )
                self.auto_segmentation_model_unselected.emit()
            except Exception as e:  # noqa
                self.new_model_status.emit(
                    self.tr(
                        "Error in loading model: {error_message}".format(
                            error_message=str(e)
                        )
                    )
                )
                print(
                    "Error in loading model: {error_message}".format(
                        error_message=str(e)
                    )
                )
                return
          ...
    ...

⚠️Note:

  • The model type field must match the type field defined in the configuration file (Step b. Define Configuration File).
  • If the model is based on SAM (Segment Anything Model) interaction patterns, replace self.auto_segmentation_model_unselected.emit() with self.auto_segmentation_model_selected.emit() to trigger the corresponding functionality. (Or better, use a configuration flag as shown in the example code).

Model Export

This section provides specific examples of converting custom models to the ONNX format for quick integration into X-AnyLabeling.

Classification

InternImage introduces a large-scale Convolutional Neural Network (CNN) model utilizing deformable convolutions as core operators. This achieves large effective receptive fields, adaptive spatial aggregation, and reduced inductive bias, enabling the learning of stronger, more robust patterns from extensive data. It surpasses current CNNs and Vision Transformers on benchmarks.

Attribute Value
Paper Title InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
Publishing Units Shanghai AI Laboratory, Tsinghua University, Nanjing University, etc.
Publication Date CVPR'23

Please refer to this tutorial.

This tutorial provides users with a method to quickly build lightweight, high-precision, and practical person attribute classification models using PaddleClas PULC (Practical Ultra-Lightweight image Classification). The model can be widely used in pedestrian analysis, tracking scenarios, etc.

Attribute Value
Publishing Units PaddlePaddle Team (Baidu)

Please refer to this tutorial.

This tutorial provides users with a method to quickly build lightweight, high-precision, and practical vehicle attribute classification models using PaddleClas PULC. The model is suitable for vehicle recognition, road monitoring, etc.

Attribute Value
Publishing Units PaddlePaddle Team (Baidu)

Please refer to this tutorial.

Object Detection

RF-DETR is the first real-time model to exceed 60 AP on the Microsoft COCO benchmark alongside competitive performance at base sizes. It also achieves state-of-the-art performance on RF100-VL, an object detection benchmark that measures model domain adaptability to real world problems. RF-DETR is comparable speed to current real-time objection models.

Organization: Roboflow

Please refer to this tutorial.

Author: Kaixuan Hu

Please refer to this tutorial.

Attribute Value
Paper Title YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
Publishing Units Institute of Information Science, Academia Sinica, Taiwan
python export.py --weights yolov7.pt --img-size 640 --grid

Note: The --grid parameter must be included when running this command.

Attribute Value
Paper Title Gathering Information Helps Explain The Locality In Structured Object Detection (Preprint includes Gold-YOLO)
Publishing Units Huawei Noah's Ark Lab
Publication Date NeurIPS'23
# Clone the repository first
git clone https://github.com/huawei-noah/Efficient-Computing.git
cd Efficient-Computing/Detection/Gold-YOLO
# Run export for desired model weight
python deploy/ONNX/export_onnx.py --weights Gold_n_dist.pt --simplify --ort
# Or other weights: Gold_s_pre_dist.pt, Gold_m_pre_dist.pt, Gold_l_pre_dist.pt

DAMO-YOLO is a fast and accurate object detection method developed by the TinyML team at Alibaba DAMO Academy's Data Analytics and Intelligence Lab. It achieves state-of-the-art performance by incorporating new techniques, including a Neural Architecture Search (NAS) backbone, an efficient re-parameterized Generalized-FPN (RepGFPN), a lightweight head, AlignedOTA label assignment, and distillation enhancement.

Attribute Value
Paper Title DAMO-YOLO: A Report on Real-Time Object Detection
Publishing Units Alibaba Group
Publication Date Arxiv'22
# Clone the repository first
git clone https://github.com/tinyvision/DAMO-YOLO.git
cd DAMO-YOLO
# Run converter for a specific config and checkpoint
python tools/converter.py -f configs/damoyolo_tinynasL25_S.py -c damoyolo_tinynasL25_S.pth --batch_size 1 --img_size 640

Real-Time Detection Transformer (RT-DETR) is the first known real-time end-to-end object detector. RT-DETR-L achieves 53.0% AP on COCO val2017 at 114 FPS on a T4 GPU, while RT-DETR-X achieves 54.8% AP at 74 FPS, surpassing all YOLO detectors of the same scale in speed and accuracy. RT-DETR-R50 achieves 53.1% AP at 108 FPS, outperforming DINO-Deformable-DETR-R50 by 2.2% AP with about 21x faster FPS.

Attribute Value
Paper Title RT-DETR: DETRs Beat YOLOs on Real-time Object Detection
Publishing Units Baidu Inc.
Publication Date Arxiv'22 (Accepted to ICCV 2023)

Please refer to external tutorials or the official repository for ONNX export instructions, as direct commands might vary. Example article (Chinese): https://zhuanlan.zhihu.com/p/628660998.

Hyper-YOLO is a novel object detection method that integrates hypergraph computation to capture complex high-order associations between visual features. It introduces a Hypergraph Computation-enhanced Semantic Collection and Scattering (HGC-SCS) framework, transforming visual feature maps into semantic space and constructing hypergraphs for high-order information propagation.

Attribute Value
Paper Title Hyper-YOLO: When Visual Object Detection Meets Hypergraph Computation
Publishing Units Tsinghua University, Xi'an Jiaotong University
Publication Date TAPMI'25 (Preprint available)

Download the model, install dependencies, then modify the Hyper-YOLO/ultralytics/export.py file (or a similar export script within that project), setting batch=1 and half=False:

# Example modification within export_onnx.py or a similar script
# Ensure necessary imports (Path, YOLO, torch, os) are present
from pathlib import Path
from ultralytics import YOLO
import torch
import os

if __name__ == '__main__':
    model_path = 'hyper-yolon-seg.pt' # Or your specific model weight file
    if isinstance(model_path, (str, Path)):
        model = YOLO(model_path)

    # Ensure export arguments are set correctly
    output_filename = model.export(
        imgsz=640,
        batch=1,         # Set batch size to 1
        format='onnx',   # Specify ONNX format
        int8=False,
        half=False,      # Set half to False
        device="0",      # Or "cpu"
        verbose=False
    )
    print(f"Model exported to {output_filename}")

Then run the export script (adjust path as needed):

python3 Hyper-YOLO/ultralytics/export.py

Segment Anything

The Segment Anything Model (SAM) generates high-quality object masks from input prompts like points or boxes. It can produce masks for all objects in an image and was trained on a dataset of 11 million images and 1.1 billion masks. SAM demonstrates strong zero-shot performance on various segmentation tasks.

Attribute Value
Paper Title Segment Anything
Publishing Units Meta AI Research, FAIR
Publication Date ICCV'23

For ONNX export, refer to community exporters like https://github.com/vietanhdev/samexporter#sam-exporter or the official repository for potential tools.

EfficientViT (underlying EfficientSAM) is a family of vision models designed for efficient high-resolution dense prediction. It uses a novel lightweight multi-scale linear attention module as its core building block, achieving global receptive fields and multi-scale learning with hardware-efficient operations. EfficientSAM adapts this for promptable segmentation.

Attribute Value
Paper Title EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense Prediction
Publishing Units MIT HAN Lab
Publication Date ICCV'23

For ONNX export, refer to the specific EfficientSAM project (e.g., linked via EfficientViT or search directly) or check benchmarking scripts like those mentioned in the EfficientViT repo: https://github.com/microsoft/Cream/tree/main/EfficientViT#benchmarking-with-onnxruntime (Note: Original provided link CVHub520/efficientvit seems like a fork, official repo might differ).

SAM-Med2D is a specialized model developed to address the challenges of applying state-of-the-art image segmentation techniques to medical images.

Attribute Value
Paper Title SAM-Med2D
Publishing Units OpenGVLab
Publication Date Arxiv'23

Refer to the deployment instructions in the official repository: https://github.com/OpenGVLab/SAM-Med2D#%EF%B8%8F-deploy. (Note: Original provided link CVHub520/SAM-Med2D seems like a fork).

HQ-SAM is an enhanced version of the Segment Anything Model (SAM) designed to improve mask prediction quality, especially for complex structures, while maintaining SAM's efficiency and zero-shot capabilities. It achieves this through an improved decoding process and additional training on a specialized dataset.

Attribute Value
Paper Title Segment Anything in High Quality
Publishing Units ETH Zurich, HKUST
Publication Date NeurIPS'23

Refer to the official HQ-SAM repository or potentially forks like https://github.com/CVHub520/sam-hq for ONNX export tutorials or scripts.

EdgeSAM is an accelerated variant of the Segment Anything Model (SAM), optimized for efficient execution on edge devices with minimal performance compromise. It claims significant speedups over the original SAM and MobileSAM on edge hardware.

Attribute Value
Paper Title EdgeSAM: Prompt-In-the-Loop Distillation for On-Device Deployment of SAM
Publishing Units S-Lab, Nanyang Technological University; Shanghai AI Laboratory
Publication Date Arxiv'23

Refer to the official repository's export script: https://github.com/chongzhou96/EdgeSAM/blob/main/scripts/export_onnx_model.py.

Grounding

Grounding DINO is a state-of-the-art (SOTA) zero-shot object detection model excelling at detecting objects not defined during training. Its ability to adapt to new objects and scenes makes it highly versatile for real-world applications. It performs well in Referring Expression Comprehension (REC), identifying and locating specific objects or regions in images based on text descriptions. Grounding DINO simplifies object detection by eliminating hand-designed components like Non-Maximum Suppression (NMS).

Attribute Value
Paper Title Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
Publishing Units IDEA-CVR, IDEA-Research
Publication Date Arxiv'23

Please refer to this tutorial.

YOLO-World enhances the YOLO series by incorporating vision-language modeling, enabling efficient open-vocabulary object detection that excels in various tasks.

Attribute Value
Paper Title YOLO-World: Real-Time Open-Vocabulary Object Detection
Publishing Units Tencent AI Lab, ARC Lab, Tencent PCG, Huazhong University of Science and Technology
Publication Date Arxiv'24
# Ensure ultralytics package is installed and updated
# pip install -U ultralytics
# Clone the ultralytics repo if needed for specific export scripts, otherwise use the pip package
# git clone https://github.com/ultralytics/ultralytics.git
# cd ultralytics
# Use the yolo command line interface
yolo export model=yolov8s-worldv2.pt format=onnx opset=13 simplify

GeCo is a unified architecture for few-shot counting, achieving high-precision object detection, segmentation, and counting through novel dense queries and a counting loss.

Attribute Value
Paper Title GeCo: Query-Based Anchors for Fine-Grained Multi-Object Counting, Detection, and Segmentation
Publishing Units University of Ljubljana
Publication Date NeurIPS'24

Please refer to this tutorial.

Image Tagging

RAM (Recognize Anything Model) is a robust image tagging model known for its exceptional image recognition capabilities. RAM excels in zero-shot generalization, is cost-effective, reproducible, and relies on open-source, annotation-free datasets. Its flexibility makes it suitable for a wide range of applications.

Attribute Value
Paper Title Recognize Anything: A Strong Image Tagging Model
Publishing Units OPPO Research Institute, IDEA-Research, AI Robotics
Publication Date Arxiv'23

Please refer to this tutorial. (Note: Original linked repo Tag2Text seems related but RAM is often associated with recognize-anything).