Application Integration Case:
Detection Model Training Tutorial

I. Project Overview

This tutorial is suitable for tasks with a small number of targets and simple backgrounds; it is suitable for novice who have not been exposed to model training.
This tutorial guides you through the generation of target detection models by training YOLO detection models in stages.
The process includes: Acquire a large number of images -> Select a small number to train the YOLOv8 detection model -> Use this model to detect the remaining images and manually fine-tune the annotation data -> Retrain a more accurate new model with all the annotation data.
Recommended camera models include: D435i, D435, D41, D455, L515, T265. This article takes D435 as an example.

Type	Name	Download Address
Installation package	Anaconda	Please click Download to enter Baidu Netdisk for download.
Installation package	Labelme	Please click Download to enter Baidu Netdisk for download.
Text	requirements.txt	Please click Download to enter GitHub to download the corresponding script file.
Script	data_collect.py
	final.py
	txt2json.py
	yolov8.py

III. Python Environment Installation

Anaconda is powerful in environmental isolation and package management. This article takes the installation of Anaconda as an example to construct a python environment. The Anaconda installation package version takes Anaconda3-2024.10-1-Windows-x86_64.exe as an example.

Open the Anaconda installation software downloaded in the Related Downloads section and click Next.
Click I Agree.
After checking the installation type, click Next.
Click Browse... to select the installation path, then click Next.
Check Add Anaconda to the system PATH environment variable and Register Anaconda as the system Python 3.7, then click Install.

TIP

Add Anaconda to the system PATH environment variable: It is officially not recommended to add Anaconda to the PATH environment variable for fear of interfering with other software. Instead, you can use Anaconda by opening Anaconda Navigator or Anaconda Prompt from the start menu (in fact, it doesn't matter whether it is checked or not, PATH can be changed later). You need to check here. You can start the Conda command in cmd.
Register Anaconda as the system Python 3.7: Check this checkbox unless you plan to install and run multiple versions of Anaconda or multiple versions of Python.

When the content shown in the figure below is displayed, click Skip to skip the VSCode installation.
Click Finish to complete the installation.
Open cmd and enter the following command to query the installed Conda version number.
cmd
```
conda --version
```
If the Conda version number can be checked, it means that anaconda has been successfully installed, as shown in the figure below.

IV. Software Package Installation

Create a Python Virtual Environment

Open the cmd command line interpreter and enter the following command to create a detect virtual environment for Python 3.9, as shown in the figure below.
cmd
```
conda create -n detect python=3.9
```
If it exists ([y]/n)? during the creation process, enter “y” to continue the installation.
After the installation is complete, execute the following command.
cmd
```
conda env list
```
You can check whether the virtual environment detect is successfully created, as shown in the figure below.
The above picture shows that the virtual environment was successfully created.

Software Packages Required for the Installation Environment

Execute the following command to activate the virtual environment detect.
cmd
```
conda activate detect
```

Execute the following command to install the required software package.

cmd

pip install pyrealsense2==2.55.1.6486
pip install opencv-python==4.10.0.84
pip install numpy==2.0.2
pip install PyYAML==6.0.2
pip install ultralytics==8.3.38
pip install tqdm==4.67.1
pip install pillow==11.0.0
pip install scikit-learn==1.5.2 -i https://pypi.mirrors.ustc.edu.cn/simple/

Or execute the following script to install the required software package.

cmd

pip install -r requirements.txt

Among them, requirements.txt is the text file downloaded in theRelated Downloads section.

V. Data Acquisition

Acquisition Code

WARNING

When the acquisition code is running, you need to use a docking station to connect the D435 camera and the computer to prevent program interruption due to line transmission problems during the acquisition process.

data_collect.py is the script file downloaded in the Related Downloads.

The script file code example is as follows:

python

"""
Collect image data for training YOLOv8
"""

import pyrealsense2 as rs
import cv2
import os
import time
import numpy as np
import re

# Create a folder to save images
color_output_folder = 'images\images'
os.makedirs(color_output_folder, exist_ok=True)

# Get the maximum sequence number of existing images in the current folder
existing_images = [f for f in os.listdir(color_output_folder) if f.lower().endswith('.png')]
max_number = 0
pattern = re.compile(r'^(\d+)\.png$')
for img_name in existing_images:
    match = pattern.match(img_name)
    if match:
        number = int(match.group(1))
        if number > max_number:
            max_number = number
start_index = max_number  # The next image sequence number starts from max_number + 1


# Camera configuration
pipeline = rs.pipeline()
config = rs.config()
config.enable_stream(rs.stream.depth, 640, 480, rs.format.z16, 30)
config.enable_stream(rs.stream.color, 640, 480, rs.format.bgr8, 30)
profile = pipeline.start(config)

# Create an alignment object
align_to = rs.stream.color
align = rs.align(align_to)

print('<<<<<<<<<<<<<<<<<<<<<<<<<<<<<   Start Detectionqq >>>>>>>>>>>>>>>>>>>>>>>>>>>>')

start_time = time.time()
frame_count = 0

while True:

    frames = pipeline.wait_for_frames()
    aligned_frames = align.process(frames)
    aligned_depth_frame = aligned_frames.get_depth_frame()
    color_frame = aligned_frames.get_color_frame()
    depth_image = np.asanyarray(aligned_depth_frame.get_data())
    color_image = np.asanyarray(color_frame.get_data())

    # Generate file name
    frame_number = start_index + frame_count + 1
    filename = f'{frame_number:04d}.png'
    color_file_path = os.path.join(color_output_folder, filename)

    cv2.imshow('Color Image', color_image)

    c = cv2.waitKey(1) & 0xFF

    if c == ord('s'):
        cv2.imwrite(color_file_path, color_image)
        frame_count += 1
        print(f'frame_count:{frame_count}')

    elif c == ord('q'):
        break


# Close windows and stop the camera
cv2.destroyAllWindows()
pipeline.stop()

print(f'Total frames captured: {frame_count}')

Precautions for Acquisition

Attention shall be paid to the following aspects when acquire images:

Number of images acquired: Ensure that there are 500+ images in each category. If the category is particularly complex or the target changes greatly, it is recommended to increase it to 1,000 or more.
Acquisition Scenario: for the actual application of the model.
- Aquire data in different scenarios, contexts and environments.
- Images of the target object are acquired from different angles and distances.
- Acquire data under different lighting conditions.

Acquisition Steps

Run the image acquisition script data_collect.py in the Python environment and a camera viewfinder pop-up window will appear at this point.
Please point the camera at the object, place the mouse on the viewfinder pop-up window, click the S on the keyboard to acquire the current images, and acquire a certain number of images as needed.
After the image is acquired, click the Q on the keyboard to end the acquisition.

VI. Labelme Annotation Image

Click the software Labelme.exe to open the interface, and click Open Directory to select the folder where the acquired images were saved before.
ClickFile > Auto Save to automatically save the annotation file after setting the annotation image.
When annotating the first image, right-click on the image and select Create Rectangle in the right-click menu.
After enclosing the object with a rectangle, click the left mouse button to customize the object label.
The bounding box should enclose the target object as accurately as possible, avoiding being too tight or too broad. For small targets, special attention should be paid to accuracy.
Click Next picture to annotate the next image.
After annotating an image, a json file with the same name will be generated under the same path as the image. Among all the acquired images, 100-300 images of each category are selected for annotating (make sure to cover all categories). The json contains detailed information such as coordinates and labels of annotated objects. The json file format is as follows:

json

{
  "version": "5.5.0",
  "flags": {},
  "shapes": [
    {
      "label": "bottle",
      "points": [
        [
          224.51612903225805,
          142.25806451612902
        ],
        [
          316.1290322580645,
          399.0322580645161
        ]
      ],
      "group_id": null,
      "description": "",
      "shape_type": "rectangle",
      "flags": {},
      "mask": null
    }
  ],
  "imagePath": "0002.png",
  "imageData": "iVBORw0KGgoAAAANSUhEUgAAAoAAAAHgCAIAAAC6s0uzAAEAAElEQVR4nLT9S7MtSZYehn3f8oh9zn1kVlZmvbIeXZ1dEIAWSQEDkpIoETCOZISZZjCTmQb6E/wtmmoiDWgmmUw0mWEoARLABggDCHQ3WO",
  "imageHeight": 480,
  "imageWidth": 640
}

TIP

shapes：It contains specific annotation information and consists of a list. Each element is a dictionary, and each dictionary includes relevant information of one annotation box. For example, "label" refers to the customized label, while "points" represents the annotated points. The value of "points" is associated with "shape_type"; in this case, "shape_type" is set to "rectangle", meaning a rectangular box is determined directly by two diagonal points of the rectangle. Therefore, the "points" only contains information of two points (the coordinates of the top-left and bottom-right corners of the detection box). The coordinates of each point are in the form of (x, y), with the origin of the coordinate system at the top-left corner of the image. The positive direction of the y-axis is downward from the origin, and the positive direction of the x-axis is to the right from the origin.
ImagePath: is the file name of the annotated image.
Shape_type: records the method selected when annotating.

We can train a model with labeled images, use the model to automatically label the remaining unlabeled images, and then train a final model with all the images and annotations. This method can reduce the workload of manual annotation. When there are multiple objects to be annotated in a acquired image, the workload of manually annotating all images is extremely large and the efficiency is extremely low.

For example: Example

VII. Detecting Model Training (Preliminary Training)

Cut File

Cut the completed annotated images and annotation files into the newly created folder images2. Note that the file generated by the annotation is a json file, as shown in the figure below.

Images and annotation files

YOLO Model

The YOLO model uses an annotation format based on TXT files, each of which contains information corresponding to the target in the image, including the target category and location in the image. The format of the YOLO annotation file is as follows:

yolo

<class> <x_center> <y_center> <width> <height>

TIP

<class> Indicates the index of the target category (starting from 0).
<x_center>, <y_center> respectively represent the proportional values of the center point of the target box relative to the horizontal and vertical coordinates of the top-left corner of the image (coordinate origin), with a value range of [0, 1] (note: there may be a typo in the original "0,10,1"; the common proportional range in target detection is 0 to 1, so it is corrected here for accuracy). For example, if the image has a width of 640 pixels and a height of 480 pixels, and the center point occupies 1/4 of the image width in the horizontal direction and 1/3 of the image height in the vertical direction, then <x_center> is 0.25 and <y_center> is 0.33.
<width>, <height> respectively represent the proportional values of the target box's width and height relative to the entire image's width and height, with the value range also being 0, 10, 1. For example, if the target box width occupies 1/2 of the image width and the height occupies 1/4 of the image height, <width> is 0.5 and <height> 0.25.

YOLO model

Conversion Format

Convert the annotation file (json format) generated by Labelme into the annotation file (txt format) required by YOLO, and divide it into training set, validation set and test set according to training requirements.

Use the following script to implement the functions of converting Labelme annotation files (json format) into YOLO annotation files (txt format) and converting and dividing data sets.

Final.py is the script file downloaded in the Related Downloads.

The script file code example is as follows:

python

# -*- coding: utf-8 -*-

import os
import numpy as np
import json
from glob import glob
import cv2
import shutil
import yaml
from sklearn.model_selection import train_test_split
from tqdm import tqdm
from PIL import Image


'''
Unify Image Formats
'''
def change_image_format(label_path, suffix='.png'):
    """
    Unify the format of all images in the current folder，如'.jpg'
    :param suffix: Suffix of the image file
    :param label_path:Current file path
    :return:
    """
    externs = ['png', 'jpg', 'JPEG', 'BMP', 'bmp']
    files = list()
    # Get all images with extensions in externs
    for extern in externs:
        files.extend(glob(label_path + "\\*." + extern))
    # Traverse all images and convert their formats
    for index,file in enumerate(tqdm(files)):
        name = ''.join(file.split('.')[:-1])
        file_suffix = file.split('.')[-1]
        if file_suffix != suffix.split('.')[-1]:
            # Rename to JPG
            new_name = name + suffix
            # Read the image
            image = Image.open(file)
            image = cv2.cvtColor(np.asarray(image), cv2.COLOR_RGB2BGR)
            #  Save the image as JPG format
            cv2.imwrite(new_name, image)
            # Delete the old image
            os.remove(file)



'''
Read All JSON Files to Get All Classes
'''
def get_all_class(file_list, label_path):
    """
    Get all classes of the current dataset from JSON files
    :param file_list:All file names in the current path
    :param label_path:Current file path
    :return:
    """
    # Initialize the class list
    classes = list()
    # Traverse all JSON files, read the label values in 'shapes', and add them to classes
    for filename in tqdm(file_list):
        json_path = os.path.join(label_path, filename + '.json')
        json_file = json.load(open(json_path, "r", encoding="utf-8"))
        for item in json_file["shapes"]:
            label_class = item['label']
            if label_class not in classes:
                classes.append(label_class)
    print('read file done')
    return classes



'''
Split into Training Set, Validation Set, and Test Set
'''
def split_dataset(label_path, test_size=0.3, isUseTest=False, useNumpyShuffle=False):
    """
    Split files into training set, test set, and validation set
    :param useNumpyShuffle: Use numpy method to split the dataset
    :param test_size: Proportion of the test set or validation set
    :param isUseTest: Whether to use a test set, default is False
    :param label_path:Current file path
    :return:
    """
    # Get all JSON files
    files = glob(label_path + "\\*.json")
    # Get the names of all JSON files
    files = [i.replace("\\", "/").split("/")[-1].split(".json")[0] for i in files]

    if useNumpyShuffle:
        file_length = len(files)
        index = np.arange(file_length)
        np.random.seed(32)
        np.random.shuffle(index) # Random split

        test_files = None
        # Whether there is a test set
        if isUseTest:
            trainval_files, test_files = np.array(files)[index[:int(file_length * (1 - test_size))]], np.array(files)[
                index[int(file_length * (1 - test_size)):]]
        else:
            trainval_files = files
        # Split into training set and validation set
        train_files, val_files = np.array(trainval_files)[index[:int(len(trainval_files) * (1 - test_size))]], \
                                 np.array(trainval_files)[index[int(len(trainval_files) * (1 - test_size)):]]
    else:
        test_files = None
        if isUseTest:
            trainval_files, test_files = train_test_split(files, test_size=test_size, random_state=55)
        else:
            trainval_files = files

        # Randomly allocate JSON file names (without suffix) at a ratio of (1-test_size)/test_size
        train_files, val_files = train_test_split(trainval_files, test_size=test_size, random_state=55)

    return train_files, val_files, test_files, files


'''
Generate Folders for YOLOv8 Training, Validation, and Test Sets
'''
def create_save_file(ROOT_DIR,isUseTest = False):
    print('step6:Generate folders for YOLOv8 training, validation, and test sets')

    # Generate training set
    train_image = os.path.join(ROOT_DIR, 'images','train')
    if not os.path.exists(train_image):
        os.makedirs(train_image)
    train_label = os.path.join(ROOT_DIR, 'labels','train')
    if not os.path.exists(train_label):
        os.makedirs(train_label)
    # Generate validation set
    val_image = os.path.join(ROOT_DIR, 'images', 'val')
    if not os.path.exists(val_image):
        os.makedirs(val_image)
    val_label = os.path.join(ROOT_DIR, 'labels', 'val')
    if not os.path.exists(val_label):
        os.makedirs(val_label)
    # Generate test set
    if isUseTest:
        test_image = os.path.join(ROOT_DIR, 'images', 'test')
        if not os.path.exists(test_image):
            os.makedirs(test_image)
        test_label = os.path.join(ROOT_DIR, 'labels', 'test')
        if not os.path.exists(test_label):
            os.makedirs(test_label)
    else:
        test_image, test_label = None,None

    return train_image, train_label, val_image, val_label, test_image, test_label


'''
Conversion: Return the Midpoint, Height, and Width of the Bounding Box Based on Image Size
'''

def convert(img_size, box):

    dw = 1. / (img_size[0])
    dh = 1. / (img_size[1])
    x = (box[0] + box[2]) / 2.0 - 1
    y = (box[1] + box[3]) / 2.0 - 1
    w = box[2] - box[0]
    h = box[3] - box[1]
    x = x * dw
    w = w * dw
    y = y * dh
    h = h * dh

    return (x, y, w, h)



'''
Move Images and Annotation Files to Specified Training, Validation, and Test Sets
'''
def push_into_file(file, images, labels, ROOT_DIR, suffix='.jpg'):
    """
    Finally, all files in the current folder are stored in the folders of the training/validation/test set paths according to images and labels respectively
    :param file: List of file names
    :param images: Path to store images
    :param labels: Path to store labels
    :param label_path: Current file path
    :param suffix: Suffix of the image file
    :return:
    """
    # Traverse all files
    for filename in tqdm(file):
        # Image file
        image_file = os.path.join(ROOT_DIR, filename + suffix)
        # Annotation file
        label_file = os.path.join(ROOT_DIR, filename + '.txt')
        # Folder for YOLOv8 to store images
        if not os.path.exists(os.path.join(images, filename + suffix)):
            try:
                shutil.copy(image_file, images)
            except OSError:
                pass
        # Folder for YOLOv8 to store annotations
        if not os.path.exists(os.path.join(labels, filename + suffix)):
            try:
                shutil.move(label_file, labels)
            except OSError:
                pass


def json2txt(classes, ROOT_DIR=""):
    """
    Convert JSON files to TXT files
    :param classes: Class names
    :param label_path: Current file path
    :return:
    """

    # 'files' contains all JSON file names
    _, _, _, files = split_dataset(ROOT_DIR)

    for json_file_ in tqdm(files):
        # Path to JSON file
        json_filename = os.path.join(ROOT_DIR, json_file_ + ".json")
        # Path to the converted TXT label file
        out_file = open('%s/%s.txt' % (ROOT_DIR, json_file_), 'w')
        # Load the label JSON file
        json_file = json.load(open(json_filename, "r", encoding="utf-8"))

        img_w = json_file['imageWidth']
        img_h = json_file['imageHeight']

        '''
        Core: Label Conversion (JSON to TXT)
        '''
        for multi in json_file["shapes"]:

            if (multi['shape_type'] == 'rectangle'):
                x1 = int(multi['points'][0][0])
                y1 = int(multi['points'][0][1])
                x2 = int(multi['points'][1][0])
                y2 = int(multi['points'][1][1])

                label = multi["label"]

                cls_id = classes.index(label)
                bb = (x1, y1, x2, y2)

                bb = convert((img_w, img_h), bb)
                out_file.write(str(cls_id) + " " + " ".join([str(a) for a in bb]) + '\n')


'''
Create YAML File
'''
def create_yaml(classes, ROOT_DIR, isUseTest=False,dataYamlName=""):
    print('step5:Create the YAML file required for YOLOv8 training')

    classes_dict = {}

    for index, item in enumerate(classes):
        classes_dict[index] = item
        
    ROOT_DIR = os.path.abspath(ROOT_DIR)


    if not isUseTest:
        desired_caps = {
            'path': ROOT_DIR,
            'train': 'images/train',
            'val': 'images/val',
            'names': classes_dict
        }
    else:
        desired_caps = {
            'path': ROOT_DIR,
            'train': 'images/train',
            'val': 'images/val',
            'test': 'images/test',
            'names': classes_dict
        }
    yamlpath = os.path.join(ROOT_DIR, dataYamlName + ".yaml")

    # Write to the YAML file
    with open(yamlpath, "w+", encoding="utf-8") as f:
        for key, val in desired_caps.items():
            yaml.dump({key: val}, f, default_flow_style=False)


# First, ensure that all images in the current folder have a unified suffix, such as .jpg. If it's another suffix, change 'suffix' to the corresponding one, such as .png
def ChangeToYoloDet(ROOT_DIR="", suffix='.bmp',classes="", test_size=0.1, isUseTest=False,useNumpyShuffle=False,auto_genClasses = False,dataYamlName=""):
    """
    Generate files in the final standard format
    :param test_size: Proportion of the test set or validation set
    :param label_path: Current file path
    :param suffix: File suffix name
    :param isUseTest: Whether to use a test set
    :return:
    """
    # step1: Unify image formats
    change_image_format(ROOT_DIR,suffix)
    # step2: Split into training, validation, and test sets based on JSON files
    train_files, val_files, test_file, files = split_dataset(ROOT_DIR, test_size=test_size, isUseTest=isUseTest)
    # step3: Get all classes based on JSON files
    classes = classes
    # Whether to automatically get the number of classes from the dataset
    if auto_genClasses:
        classes = get_all_class(files, ROOT_DIR)
    '''
    step4:（***Core***）Convert JSON files to TXT files and store JSON files in the specified folder
    '''
    json2txt(classes, ROOT_DIR=ROOT_DIR)
    # step5: Create the YAML file required for YOLOv8 training
    create_yaml(classes, ROOT_DIR, isUseTest=isUseTest,dataYamlName=dataYamlName)
    # step6: Generate folders for YOLOv8 training, validation, and test sets
    train_image_dir, train_label_dir, val_image_dir, val_label_dir, test_image_dir, test_label_dir = create_save_file(ROOT_DIR,isUseTest)
    # step7: Move all images and annotation files to the corresponding training, validation, and test sets
    # Move files to the training set folder
    push_into_file(train_files, train_image_dir, train_label_dir,ROOT_DIR=ROOT_DIR, suffix=suffix)
    # Move files to the validation set folder
    push_into_file(val_files, val_image_dir, val_label_dir,ROOT_DIR=ROOT_DIR,  suffix=suffix)
    # If the test set exists, move files to the test set folder
    if test_file is not None:
        push_into_file(test_file, test_image_dir, test_label_dir, ROOT_DIR=ROOT_DIR, suffix=suffix)

    print('create dataset done')


if __name__ == "__main__":
    '''
    1.ROOT_DIR: Path to images and JSON labels
    2.suffix：Unified image suffix
    3.classes=['dog', 'cat'],  # Enter the names in your annotation list (note case sensitivity), used to customize the correspondence between class names and IDs
    4.test_size：Proportion of the test set or validation set
    5.isUseTest：Whether to enable the test set
    6.useNumpyShuffle：Whether to shuffle randomly
    7.auto_genClasses：Whether to automatically generate a class list based on JSON labels
    8.dataYamlName：Name of the dataset YAML file
    '''

    ChangeToYoloDet(
        ROOT_DIR = r'D:\HuaweiMoveData\Users\z9574\Desktop\yolo_detect_train\images2', # Try to avoid Chinese characters in the dataset path as much as possible.
        suffix='.png',  # Determine the image suffix for unifying image suffixes
        classes=[],  # Enter the names in your annotation list (note case sensitivity)
        test_size=0.3,  # If the test set is set, this is the proportion of the test set to the total dataset; otherwise, it's the proportion of the validation set to the total dataset
        isUseTest=False,  # Whether to enable the test set
        useNumpyShuffle=False,   # Whether to shuffle
        auto_genClasses = True,  # Whether to automatically generate class IDs based on the dataset
        dataYamlName= "bottle_data" # Name of the dataset YAML file
    )

The main function of the above program is ChangeToYoloDet, and only the following parameters need to be set:

Parameters	Parameter meaning
ROOT_DIR	The path where the image and Labelme generated annotation file (json file) are located, absolute path (the current path cannot have a Chinese name, otherwise sometimes errors will occur when generating yaml file content)
test_size	If a test set is set, this is the proportion of the test set to the total data set; otherwise, this is the proportion of the validation set to the total data set.
isUseTest	Whether to test the set (if False, only the training set and validation set will be divided)
dataYamlName	Yolo training yaml file name, used in later training

Executing the above script can generate the following file directories under the ROOT_DIR path. Root file path

Among them, the images and annotation files have been divided into training sets and verification sets according to the proportion set by the script.

At the same time, a yaml file is also generated under the ROOT_DIR path. The name of the yaml file is specified by the dataYamlName parameter in the script file. This file is required for subsequent model detection and training.

Model Training to Obtain a Preliminary Detection Model

Execute the following command to activate the virtual environment detect.

cmd

conda activate detect

Execute the yolov8 training command.

cmd

yolo detect train data=images2/bottle_data.yaml model=yolov8n.pt epochs=300 batch=16

The parameters are described as follows:

Parameters	Value to enter	Definition
data	images2/bottle_data.yaml	The path to the data set configuration file (for example, coco8.yaml). The file contains parameters specific to the dataset, including paths to training and validation data, class names, and number of classes. This is generated by the script above
model	`yolov8n.pt`	Specify the model file for training (it will be downloaded automatically, and a network is required)
epochs	300	Total number of training epochs. Each epoch represents one complete pass through the entire dataset. Adjusting this value will affect training time and model performance.
batch	16	Batch size
patience	100 (default value)	Training will stop early if the validation metrics show no improvement after epochs reach patience rounds.

When the training is completed, a runs folder will be generated, and the trained model will be stored in the detect\train\weights folder of the runs file.

Model file

In the figure, best.pt and last.pt are both training weight files for the YOLO model. The difference is:

best.pt: What is saved is the weight of the model that performs best on the validation set during training. During training, the validation set is evaluated after each epoch and the weights of the best performing models are recorded. This file is typically used in the inference and deployment phases, as it contains the weights of the model that performed best on the validation set, enabling optimal performance.
last.pt: What is saved is the model weight after the last training iteration. This file is often used to continue training a model because it contains the model weights at the end of the last training iteration, and you can continue training the model from where the previous training ended. The difference in use is that when you need to continue training on the basis of previous training, you should use last.pt as the starting point for training; when you need to use the trained model for interference and deployment, you should use best.pt.

VIII. Automatic Annotation

Generate Labels Using Preliminary Detection Models

Execute the following command to detect the data set to be automatically annotated:

cmd

yolo predict model=runs\detect\train\weights\best.pt     source=images\images   save_txt=True

The parameters are described as follows:

Parameters	Value to enter	Meaning
model	runs\detect\train\weights\best.pt	Weights used for detection (weights obtained from preliminary training, models obtained from the previous step)
source	images\images	Specify the data source used for inference. It can be an image path, video file, directory, URL or device ID of a real-time feed. Support multiple formats and sources.
save_txt	True	Save the detection results in a text file

The results of the command outputting the detection image and detection file are shown in the figure below:

Detection images

Detection files

Output image:

YOLO will detect objects in the original input image with bounding boxes and save new images with annotations. Each image of a detected object will have one or more bounding boxes, each box surrounds an instance of the detected object.
The image is saved in the runs\detect\predict directory under the current path.

Output text file:

For each processed image, YOLO also generates a .txt file with the same name as the image. This text file contains detailed information for each detected object instance, with a typical format of: Class ID, Confidence Score, Bounding Box Coordinates (top-left x-coordinate, top-left y-coordinate, width (w), height (h)). For example: 0 0.8765 0.4567 0.3210 0.2345 detects that this is an object with a category ID of 0 (assuming it is the 'bottle' category) with a confidence score of 87.65%, and the bounding box is located in the center of the image and occupies a certain proportion of the area.
Save it in the runs\detect\predict\labels directory under the current path.

At this time, you need to check the images of runs\detect\predict detection. If there are inaccurate or biased detections, you need to modify the generated label files.

Convert txt File Under Label to json File Used by Labelme

This step requires converting the file (txt format) generated in the previous step into the annotation file (json file) required by Labelme annotation software, so that Labelme can be used to fine-tune the annotation file later.

Txt2json.py is the script file downloaded in the Related Downloads.

The script file code example is as follows:

python

import os
import cv2
import json
import glob
import numpy as np


def convert_txt_to_labelme_json(txt_path, image_path, output_dir, image_fmt='.png'):
    # Convert txt to Labelme JSON
    # Convert YOLO txt to Labelme JSON
    txts = glob.glob(os.path.join(txt_path, "*.txt"))
    for txt in txts:
        labelme_json = {
            'version': '5.5.0',
            'flags': {},
            'shapes': [],
            'imagePath': None,
            'imageData': None,
            'imageHeight': None,
            'imageWidth': None,
        }
        txt_name = os.path.basename(txt)
        image_name = txt_name.split(".")[0] + image_fmt
        labelme_json['imagePath'] = image_name
        image_name = os.path.join(image_path, image_name)
        if not os.path.exists(image_name):
            raise Exception('txt file={},Image not found={}'.format(txt, image_name))
        image = cv2.imdecode(np.fromfile(image_name, dtype=np.uint8), cv2.IMREAD_COLOR)
        h, w = image.shape[:2]
        labelme_json['imageHeight'] = h
        labelme_json['imageWidth'] = w
        with open(txt, 'r') as t:
            lines = t.readlines()
            for line in lines:
                content = line.split(' ')
                label = content[0]
                object_width = float(content[3])
                object_height = float(content[4])
                top_left_x = (float(content[1]) - object_width / 2) * w
                top_left_y = (float(content[2]) - object_height / 2) * h
                bottom_right_x = (float(content[1]) + object_width / 2) * w
                bottom_right_y = (float(content[2]) + object_height / 2) * h
                try:
                    shape = {
                        'label': dict_.get(int(label)),
                        'score': float(content[5]),
                        'group_id': None,
                        'shape_type': 'rectangle',
                        'flags': {},
                        'points': [
                            [float(top_left_x), float(top_left_y)],
                            [float(bottom_right_x), float(bottom_right_y)]
                        ]
                    }
                except Exception as e:
                    # print(e)
                    shape = {
                        'label': dict_.get(int(label)),
                        'score': float(0.99),
                        'group_id': None,
                        'shape_type': 'rectangle',
                        'flags': {},
                        'points': [
                            [float(top_left_x), float(top_left_y)],
                            [float(bottom_right_x), float(bottom_right_y)]
                        ]
                    }
                labelme_json['shapes'].append(shape)
            json_name = txt_name.split('.')[0] + '.json'
            json_name_path = os.path.join(output_dir, json_name)
            fd = open(json_name_path, 'w')
            json.dump(labelme_json, fd, indent=4)
            fd.close()
            print("save json={}".format(json_name_path))


if __name__ == "__main__":
    
    dict_ = {0: "bottle"}
    in_imgs_dir = r'images\images'
    in_label_txt_dir = r'runs\detect\predict\labels'
    out_labelme_json_dir = in_imgs_dir

    if not os.path.exists(out_labelme_json_dir):
        os.mkdir(out_labelme_json_dir)
    convert_txt_to_labelme_json(in_label_txt_dir, in_imgs_dir, out_labelme_json_dir, image_fmt='.png')

Please modify the following parameters in the above script according to the following description.

Parameters	Value to enter	Meaning
dict_	Category name and id when annotating	Category name and id when annotating (to be changed to the tag name and id you customized when annotating images)
in_imgs_dir	images\images	Path of images to be fine-tuned
in_label_txt_dir	runs\detect\predict\labels	Previous step Model detection labels file save path
out_labelme_json_dir	in_imgs_dir	The save path for the JSON files generated by converting TXT files (set to the image path, so the JSON files are saved together with the images).

TIP

Among them,in_imgs_dir is the storage path of the json file (required by Labelme) corresponding to the generated image.

Fine-tuning Labels

Use the Labelme annotation tool to load the generated annotation files and corresponding images, and fine-tune the labels detected by the model.

The purpose of fine-tuning with Labelme is:

Correction of incorrect annotations: The labels generated by the preliminary detection model may contain errors or inaccurate annotations, which can be corrected through fine-tuning.
Improve the quality of labeling: The quality of automatically generated labels may not be as high as that of manual ones, and fine-tuning can improve the accuracy of labeling.

Open directory

IX. Retraining

Move all files (images and JSON files) from "images2" and all files (images and JSON files) from "images/images" into a new folder named "images3". Then execute Conversion FormatandModel Training to Obtain a Preliminary Detection Model, and start the model training process following the steps.

After the model training is completed, the final generated model file is stored in the weights folder.

Model file

X. Using Model

Connect the camera and computer with a docking station, put best.pt and the detection script yolov8.py in the same path, aim the D435 camera lens at the object to be detected, and use the best.pt model for detection.

Yolov8.py is the script file downloaded in theRelated Downloads.

The script file code example is as follows:

python

import cv2
import pyrealsense2 as rs
import time
import numpy as np
import math
from ultralytics import YOLO
 
# Load YOLOv8 model
model = YOLO("best.pt")
 
# # Get camera content, parameter 0 means using the default camera
# cap = cv2.VideoCapture(1)
 
# Configure RealSense
pipeline = rs.pipeline()
config = rs.config()
config.enable_stream(rs.stream.depth, 848, 480, rs.format.z16, 30)
config.enable_stream(rs.stream.color, 848, 480, rs.format.bgr8, 30)
 
# Start camera stream
pipeline.start(config)
align_to = rs.stream.color  # Align with color stream
align = rs.align(align_to)
 
 
 
def get_aligned_images():
    frames = pipeline.wait_for_frames()  # Wait to get image frames
    aligned_frames = align.process(frames)  # Get aligned frames
    aligned_depth_frame = aligned_frames.get_depth_frame()  # Get depth frame from aligned frames
    color_frame = aligned_frames.get_color_frame()  # Get color frame from aligned frames
 
    # Get camera parameters
    intr = color_frame.profile.as_video_stream_profile().intrinsics  # Get camera intrinsics
    depth_intrin = aligned_depth_frame.profile.as_video_stream_profile(
    ).intrinsics  # Get depth parameters (used for converting pixel coordinates to camera coordinates)
    '''camera_parameters = {'fx': intr.fx, 'fy': intr.fy,
                         'ppx': intr.ppx, 'ppy': intr.ppy,
                         'height': intr.height, 'width': intr.width,
                         'depth_scale': profile.get_device().first_depth_sensor().get_depth_scale()
                         }'''
 
    # Save intrinsics locally
    # with open('./intrinsics.json', 'w') as fp:
    # json.dump(camera_parameters, fp)
    #######################################################
 
    depth_image = np.asanyarray(aligned_depth_frame.get_data())  # Depth map (16-bit by default)
    depth_image_8bit = cv2.convertScaleAbs(depth_image, alpha=0.03)  #  8-bit depth map
    depth_image_3d = np.dstack(
        (depth_image_8bit, depth_image_8bit, depth_image_8bit))  # 3-channel depth map
    color_image = np.asanyarray(color_frame.get_data())  # RGB image
 
    # Return camera intrinsics, depth parameters, color image, depth image, depth frame from aligned frames
    return intr, depth_intrin, color_image, depth_image, aligned_depth_frame
 
 
def get_3d_camera_coordinate(depth_pixel, aligned_depth_frame, depth_intrin):
    x = depth_pixel[0]
    y = depth_pixel[1]
    dis = aligned_depth_frame.get_distance(x, y)  # Get the depth corresponding to this pixel
    # print ('depth: ',dis)       # Depth unit is meters
    camera_coordinate = rs.rs2_deproject_pixel_to_point(depth_intrin, depth_pixel, dis)
    # print ('camera_coordinate: ',camera_coordinate)
    return dis, camera_coordinate
 
# Initialize FPS calculation
fps = 0
frame_count = 0
start_time = time.time()
 
try:
    while True:
        # Wait to get a pair of consecutive frames: depth and color
        intr, depth_intrin, color_image, depth_image, aligned_depth_frame = get_aligned_images()
 
        if not depth_image.any() or not color_image.any():
            continue
 
        # Get current time
        time1 = time.time()
 
        # Convert image to numpy array
        depth_colormap = cv2.applyColorMap(cv2.convertScaleAbs(
            depth_image, alpha=0.03), cv2.COLORMAP_JET)
        images = np.hstack((color_image, depth_colormap))
 
        # Perform object detection with YOLOv8
        results = model.predict(color_image, conf=0.5)
        annotated_frame = results[0].plot()
        detected_boxes = results[0].boxes.xyxy  # Get bounding box coordinates
        # print('Bounding box coordinates', detected_boxes)
        for i, box in enumerate(detected_boxes):
            x1, y1, x2, y2 = map(int, box)  # Get bounding box coordinates
 
            # Calculate step size
            xrange = max(1, math.ceil(abs((x1 - x2) / 30)))
            yrange = max(1, math.ceil(abs((y1 - y2) / 30)))
            # xrange = 1
            # yrange = 1
 
            point_cloud_data = []
 
            # Get 3D coordinates of points within the range
            for x_position in range(x1, x2, xrange):
                for y_position in range(y1, y2, yrange):
                    depth_pixel = [x_position, y_position]
                    dis, camera_coordinate = get_3d_camera_coordinate(depth_pixel, aligned_depth_frame,
                                                                      depth_intrin)  # Get 3D coordinates of the corresponding pixel
                    point_cloud_data.append(f"{camera_coordinate} ")
 
            # Write all data at once
            with open("point_cloud_data.txt", "a") as file:
                file.write(f"\nTime: {time.time()}\n")
                file.write(" ".join(point_cloud_data))
 
            # Display center point coordinates
            ux = int((x1 + x2) / 2)
            uy = int((y1 + y2) / 2)
            dis, camera_coordinate = get_3d_camera_coordinate([ux, uy], aligned_depth_frame,
                                                              depth_intrin)  # Get 3D coordinates of the corresponding pixel
            formatted_camera_coordinate = f"({camera_coordinate[0]:.2f}, {camera_coordinate[1]:.2f}, {camera_coordinate[2]:.2f})"
 
            cv2.circle(annotated_frame, (ux, uy), 4, (255, 255, 255), 5)  # Mark the center point
            cv2.putText(annotated_frame, formatted_camera_coordinate, (ux + 20, uy + 10), 0, 1,
                        [225, 255, 255], thickness=1, lineType=cv2.LINE_AA)  # Mark the coordinates
 
        # Calculate FPS
        frame_count += 1
        time2 = time.time()
        fps = int(1 / (time2 - time1))
        # Display FPS
        cv2.putText(annotated_frame, f'FPS: {fps:.2f}', (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2,
                    cv2.LINE_AA)
        # Display results
        cv2.imshow('YOLOv8 RealSense', annotated_frame)
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break
 
finally:
    # Stop the stream
    pipeline.stop()

If the training model is normal, a box will appear as shown in the figure below to frame the target object. The center of the box shows the position of the object in the camera coordinate system (the unit of x, y, z is m), and the upper left corner shows the detection category and confidence rate of the object.

Detection results

Application Integration Case: Detection Model Training Tutorial ​

I. Project Overview ​

II. Related Downloads ​

III. Python Environment Installation ​

IV. Software Package Installation ​

Create a Python Virtual Environment ​

Software Packages Required for the Installation Environment ​

V. Data Acquisition ​

Acquisition Code ​

Precautions for Acquisition ​

Acquisition Steps ​

VI. Labelme Annotation Image ​

VII. Detecting Model Training (Preliminary Training) ​

Cut File ​

YOLO Model ​

Conversion Format ​

Model Training to Obtain a Preliminary Detection Model ​

VIII. Automatic Annotation ​

Generate Labels Using Preliminary Detection Models ​

Convert txt File Under Label to json File Used by Labelme ​

Fine-tuning Labels ​

IX. Retraining ​

X. Using Model ​

Application Integration Case:
Detection Model Training Tutorial

I. Project Overview

II. Related Downloads

III. Python Environment Installation

IV. Software Package Installation

Create a Python Virtual Environment

Software Packages Required for the Installation Environment

V. Data Acquisition

Acquisition Code

Precautions for Acquisition

Acquisition Steps

VI. Labelme Annotation Image

VII. Detecting Model Training (Preliminary Training)

Cut File

YOLO Model

Conversion Format

Model Training to Obtain a Preliminary Detection Model

VIII. Automatic Annotation

Generate Labels Using Preliminary Detection Models

Convert txt File Under Label to json File Used by Labelme

Fine-tuning Labels

IX. Retraining

X. Using Model