dynsight.vision.VisionInstance¶

class dynsight.vision.VisionInstance(source, output_path, model='yolo12n.pt', device=None, workers=8)[source]¶

Class for performing computer vision tasks using YOLO models.

This class supports object detection, Convolutional Neural Network (CNN) training and fine-tuning, as well as the creation and management of training datasets.

Caution

This class is still under development and may not function as intended.

Parameters:

source (str | Path) – The source of the images or videos to be processed. For the list of the possible sources, we refer the user to the following sources table. For the list of the supported formats see this formats table.
output_path (Path) – The path to save the output folder.
model (str | Path) – The path to the YOLO model file. Defaults to “yolo12n.pt”. See here for more information.
device (str | None) – Allows users to select between cpu, a specific gpu ID or “mps” for MacOS users to perform the calculation (“cuda:0” or “0” for GPUs, “cpu” or “mps” for MacOS).
workers (int) – Number of worker threads for data loading. Influences the speed of data preprocessing and feeding into the model, especially useful in multi-GPU setups. (only for training sessions).

Methods

`create_dataset_from_predictions`	Create a YOLO training dataset from `predict` results.
`export_prediction_to_xyz`	Export prediction results into a single `.xyz` file.
`predict`	Detect objects within the source.
`set_training_dataset`	Set the training dataset for the model training.
`train`	Train a custom model using a training dataset.
`tune_hyperparams`	Tune hyperparameters for the model.

create_dataset_from_predictions(dataset_name, train_split=0.8, load_dataset=True)[source]¶

Create a YOLO training dataset from predict results.

Parameters:

dataset_name (str) – Name of the dataset that will be created.
train_split (float) – Fraction of images to be used as training set, the remaining fraction will be used for the validation set.
load_dataset (bool) – Directly load the dataset for the next training sessions.

Return type:

None

export_prediction_to_xyz(file_name, class_filter=None)[source]¶

Export prediction results into a single .xyz file.

Each frame of the resulting .xyz corresponds to one of the images/frames present in the source and used in the predict method.

Parameters:

file_name (Path) – File name for the .xyz file.
class_filter (list[int] | None) – Limit exported detections to the specified class IDs. If None all detected objects will be exported.

Returns:

Path to the exported .xyz file.

Return type:

Path

predict(prediction_title, augment=False, agnostic_nms=False, show_labels=False, class_filter=None, confidence=0.25, iou=0.7, imgsz=640, max_det=500)[source]¶

Detect objects within the source.

Parameters:

prediction_title (str) – The name of the prediction session.
augment (bool) – Enables test-time augmentation (TTA) for predictions, potentially improving detection robustness at the cost of inference speed.
agnostic_nms (bool) – Enables class-agnostic Non-Maximum Suppression (NMS), which merges overlapping boxes of different classes. Useful in multi-class detection scenarios where class overlap is common.
show_labels (bool) – Show labels names in the detected source version.
class_filter (list[int] | None) – Filters predictions to a set of class IDs. Only detections belonging to the specified classes will be returned.
confidence (float) – Sets the minimum confidence threshold for detections. Objects detected with confidence below this threshold will be disregarded.
iou (float) – Lower values result in fewer detections by eliminating overlapping boxes, useful for reducing duplicates.
imgsz (int | tuple[int, int]) – Defines the image size for inference. Can be a single integer for square resizing or a tuple. Proper sizing can improve detection accuracy and processingspeed.
max_det (int) – The maximum number of detections for a single frame / image.

Return type:

None

set_training_dataset(training_data_yaml)[source]¶

Set the training dataset for the model training.

Training dataset are setted through a yaml file that should have the following structure:

path: path/to/dataset/folder
train: path/to/train/images
val: path/to/val/images

nc: number_of_classes
names: [class1, class2, ...]

With a dataset folder structure like this:

dataset/
├── images/
│   ├── train/
│   │   ├── 1.jpg
│   │   ├── 2.jpg
│   │   └── ...
│   └── val/
│       ├── 5.jpg
│       ├── 6.jpg
│       └── ...
└── labels/
    ├── train/
    │   ├── 1.txt
    │   ├── 2.txt
    │   └── ...
    └── val/
        ├── 5.txt
        ├── 6.txt
        └── ...

Parameters:: training_data_yaml (Path) – Path to the training data YAML file.
Return type:: None

train(title, hyperparams=None, epochs=100, batch_size=16, patience=20, imgsz=640)[source]¶

Train a custom model using a training dataset.

This function trains a custom model using a training dataset. The dataset should be set before calling this function with the set_training_data method.

Parameters:

title (str) – The name of the resulting model.

hyperparams (dict[str, float] | None) –

The dictionary that contains all the hyperparameters for the model training. The following default dict is used if not provided:

# Defaults hyperparameters dictionary.
default_hyperparams = {
    "lr0": 0.01,
    "lrf": 0.01,
    "momentum": 0.937,
    "weight_decay": 0.0005,
    "warmup_epochs": 3.0,
    "warmup_momentum": 0.8,
    "box": 7.5,
    "cls": 0.5,
    "dfl": 1.5,
    "hsv_h": 0.015,
    "hsv_s": 0.7,
    "hsv_v": 0.4,
    "degrees": 0.0,
    "translate": 0.1,
    "scale": 0.5,
    "shear": 0.0,
    "perspective": 0.0,
    "flipud": 0.0,
    "fliplr": 0.5,
    "bgr": 0.0,
    "mosaic": 1,
    "mixup": 0.0,
    "cutmix": 0.0,
    "copy_paste": 0.0
}

Manually customize this dict to change the training performance or use the tune_hyperparams method to automatically optimize hyperparameters.

epochs (int) – Total number of training epochs. Each epoch represents a full pass over the entire dataset.
batch_size (int) – Three modes available: set as an integer (batch=16), auto mode for 60% GPU memory utilization (batch=-1), or auto mode with specified utilization fraction (batch=0.70).
patience (int) – Number of epochs to wait without improvement in validation metrics before early stopping the training. Helps to prevent overfitting.
imgsz (int | tuple[int, int]) – Defines the image size for inference. Can be a single integer for square resizing or a tuple. Proper sizing can improve detection accuracy and processing speed.

Return type:

None

tune_hyperparams(iterations=15, epochs=50, imgsz=640, batch_size=16)[source]¶

Tune hyperparameters for the model.

Optimize the CNN hyperparameters by leveraging the Ultralytics YOLO genetic algorithm. It returns a dictionary of the best hyperparameters, which can be directly used as input to the hyperparameters parameter in the train method.

Parameters:

iterations (int) – The number of exploring iterations. The higher the number, the more accurate the results will be, increasing the computational cost.
epochs (int) – The number of epochs to perform for each iteration. Each epoch represents a full pass over the entire dataset.
imgsz (int | tuple[int, int]) – Defines the image size for inference. Can be a single integer for square resizing or a tuple. Proper sizing can improve detection accuracy and processing speed.
batch_size (int) – Three modes available: set as an integer (batch=16), auto mode for 60% GPU memory utilization (batch=-1), or auto mode with specified utilization fraction (batch=0.70).

Return type:

dict[str, float]