Simon Shi的小站

SLAM mapping

Posted on 2023-09-09 Edited on 2025-08-06 In SLAM

SLAM tuture

Posted on 2023-09-09 Edited on 2025-08-06 In SLAM

TensorRT call By Python

Posted on 2023-09-04 Edited on 2025-08-06 In AI , deploy , TensorRT

from __future__ import print_function

import numpy as np
import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit
from PIL import ImageDraw

from yolov3_to_onnx import download_file
from data_processing import PreprocessYOLO, PostprocessYOLO, ALL_CATEGORIES

import sys, os
sys.path.insert(1, os.path.join(sys.path[0], ".."))
import common

TRT_LOGGER = trt.Logger()

def draw_bboxes(image_raw, bboxes, confidences, categories, all_categories, bbox_color='blue'):
    """Draw the bounding boxes on the original input image and return it.

    Keyword arguments:
    image_raw -- a raw PIL Image
    bboxes -- NumPy array containing the bounding box coordinates of N objects, with shape (N,4).
    categories -- NumPy array containing the corresponding category for each object,
    with shape (N,)
    confidences -- NumPy array containing the corresponding confidence for each object,
    with shape (N,)
    all_categories -- a list of all categories in the correct ordered (required for looking up
    the category name)
    bbox_color -- an optional string specifying the color of the bounding boxes (default: 'blue')
    """
    draw = ImageDraw.Draw(image_raw)
    print(bboxes, confidences, categories)
    for box, score, category in zip(bboxes, confidences, categories):
        x_coord, y_coord, width, height = box
        left = max(0, np.floor(x_coord + 0.5).astype(int))
        top = max(0, np.floor(y_coord + 0.5).astype(int))
        right = min(image_raw.width, np.floor(x_coord + width + 0.5).astype(int))
        bottom = min(image_raw.height, np.floor(y_coord + height + 0.5).astype(int))

        draw.rectangle(((left, top), (right, bottom)), outline=bbox_color)
        draw.text((left, top - 12), '{0} {1:.2f}'.format(all_categories[category], score), fill=bbox_color)

    return image_raw

def get_engine(onnx_file_path, engine_file_path=""):
    """Attempts to load a serialized engine if available, otherwise builds a new TensorRT engine and saves it."""
    def build_engine():
        """Takes an ONNX file and creates a TensorRT engine to run inference with"""
        with trt.Builder(TRT_LOGGER) as builder, builder.create_network() as network, trt.OnnxParser(network, TRT_LOGGER) as parser:
            builder.max_workspace_size = 1 << 30 # 1GB
            builder.max_batch_size = 1
            # Parse model file
            if not os.path.exists(onnx_file_path):
                print('ONNX file {} not found, please run yolov3_to_onnx.py first to generate it.'.format(onnx_file_path))
                exit(0)
            print('Loading ONNX file from path {}...'.format(onnx_file_path))
            with open(onnx_file_path, 'rb') as model:
                print('Beginning ONNX file parsing')
                parser.parse(model.read())
            print('Completed parsing of ONNX file')
            print('Building an engine from file {}; this may take a while...'.format(onnx_file_path))
            engine = builder.build_cuda_engine(network)
            print("Completed creating Engine")
            with open(engine_file_path, "wb") as f:
                f.write(engine.serialize())
            return engine

    if os.path.exists(engine_file_path):
        # If a serialized engine exists, use it instead of building an engine.
        print("Reading engine from file {}".format(engine_file_path))
        with open(engine_file_path, "rb") as f, trt.Runtime(TRT_LOGGER) as runtime:
            return runtime.deserialize_cuda_engine(f.read())
    else:
        return build_engine()

def main():
    """Create a TensorRT engine for ONNX-based YOLOv3-608 and run inference."""

    # Try to load a previously generated YOLOv3-608 network graph in ONNX format:
    onnx_file_path = 'yolov3.onnx'
    engine_file_path = "yolov3.trt"
    # Download a dog image and save it to the following file path:
    input_image_path = download_file('dog.jpg',
        'https://github.com/pjreddie/darknet/raw/f86901f6177dfc6116360a13cc06ab680e0c86b0/data/dog.jpg', checksum_reference=None)

    # Two-dimensional tuple with the target network's (spatial) input resolution in HW ordered
    input_resolution_yolov3_HW = (608, 608)
    # Create a pre-processor object by specifying the required input resolution for YOLOv3
    preprocessor = PreprocessYOLO(input_resolution_yolov3_HW)
    # Load an image from the specified input path, and return it together with  a pre-processed version
    image_raw, image = preprocessor.process(input_image_path)
    # Store the shape of the original input image in WH format, we will need it for later
    shape_orig_WH = image_raw.size

    # Output shapes expected by the post-processor
    output_shapes = [(1, 255, 19, 19), (1, 255, 38, 38), (1, 255, 76, 76)]
    # Do inference with TensorRT
    trt_outputs = []
    with get_engine(onnx_file_path, engine_file_path) as engine, engine.create_execution_context() as context:
        inputs, outputs, bindings, stream = common.allocate_buffers(engine)
        # Do inference
        print('Running inference on image {}...'.format(input_image_path))
        # Set host input to the image. The common.do_inference function will copy the input to the GPU before executing.
        inputs[0].host = image
        trt_outputs = common.do_inference(context, bindings=bindings, inputs=inputs, outputs=outputs, stream=stream)

    # Before doing post-processing, we need to reshape the outputs as the common.do_inference will give us flat arrays.
    trt_outputs = [output.reshape(shape) for output, shape in zip(trt_outputs, output_shapes)]

    postprocessor_args = {"yolo_masks": [(6, 7, 8), (3, 4, 5), (0, 1, 2)],                    # A list of 3 three-dimensional tuples for the YOLO masks
                          "yolo_anchors": [(10, 13), (16, 30), (33, 23), (30, 61), (62, 45),  # A list of 9 two-dimensional tuples for the YOLO anchors
                                           (59, 119), (116, 90), (156, 198), (373, 326)],
                          "obj_threshold": 0.6,                                               # Threshold for object coverage, float value between 0 and 1
                          "nms_threshold": 0.5,                                               # Threshold for non-max suppression algorithm, float value between 0 and 1
                          "yolo_input_resolution": input_resolution_yolov3_HW}

    postprocessor = PostprocessYOLO(**postprocessor_args)

    # Run the post-processing algorithms on the TensorRT outputs and get the bounding box details of detected objects
    boxes, classes, scores = postprocessor.process(trt_outputs, (shape_orig_WH))
    # Draw the bounding boxes onto the original input image and save it as a PNG file
    obj_detected_img = draw_bboxes(image_raw, boxes, scores, classes, ALL_CATEGORIES)
    output_image_path = 'dog_bboxes.png'
    obj_detected_img.save(output_image_path, 'PNG')
    print('Saved image with bounding boxes of detected objects to {}.'.format(output_image_path))

if __name__ == '__main__':
    main()

SLAM 随笔

Posted on 2023-08-15 Edited on 2025-08-06 In SLAM

对极约束

2D-2D，根据特征点，恢复相机的运动R，t

两步走：

1 根据配对点的像素位置，（8点法）求出 E 或者 F
2 根据 E 或者 F，(SVD分解) 求出 R; t。

基础矩阵 F

基础矩阵，不知道内参，外参的情况；
单应矩阵，已知内参

本质矩阵 E

3x3的矩阵

$$
E = \hat{t}R
$$

典型用法：8点法，求解相机运动，R, t

单应矩阵 H

本质矩阵的特殊应用，特征点在同一个面，只需要4对点

三角测量

求得两帧下点的深度，可以确定它们的空间坐标（世界坐标系）

基于人类反馈的强化学习RLHF

Posted on 2023-08-01 Edited on 2025-08-06

Secrets of RLHF in Large Language Models

论文链接：https://arxiv.org/pdf/2307.04964.pdf

仓库链接：https://github.com/OpenLMLab/MOSS-RLHF

研究人员探究了PPO算法的高级版本PPO-max，可以有效提高策略模型的训练稳定性，并基于主要实验结果，综合分析了RLHF与SFT模型和ChatGPT的能力对比。

RLHF go

人工智能助手的训练过程包括三个主要阶段：有监督微调（SFT）、奖励模型（RM）训练和奖励模型上的近端策略优化（PPO）。

(DanZero)Mastering Guandan Game with Reinforcement Learning

Posted on 2023-08-01 Edited on 2025-08-06 In Game , GuanDan

DanZero: Mastering GuanDan Game with Reinforcement Learning

游戏发杂度

Pai Type

Sample

游戏中口型动画合成系统

Posted on 2023-07-15 Edited on 2025-08-06

游戏中口型动画合成系统

基于共振峰提取元音
基于神经网络提取音素

https://www.synthesia.io/

AIGC

Posted on 2023-07-09 Edited on 2025-08-06

Avata-Driver 2D

在AIGC领域中，虚拟数字人技术一般分为2D虚拟数字人和3D虚拟数字人两条路线，前者包括唇型驱动、动作驱动、TTS、高分辨率生成等相关算法。本文详细介绍了2D虚拟数字人的整体框架，每个步骤的原理和对应的开源代码，以及如何落地应用。

AIGC-小ç讲车

chatGPT: 生成 prompt 和文案
Stable diffusion: 用 prompt 生成人像
ç: 生成讲述文案的虚拟数字人

Loop:

使用 chatGPT 生成用于图像生成的prompt

使用 chatGPT 生成用于讲述的文案

生成讲述文案的虚拟数字人

GANs

基于GAN的说话人驱动：talking face generation解读_mb60e8123127ed0的技术博客_51CTO博客

1、方法分类

目前talking face generation方法主要可以归于2类：

(1) 直接法: 直接学习从音频(audio)到视频帧的映射（audio2image）；

(2) 间接法: 利用中间模态表征（一般是人脸的2D、3D landmarks特征）来连接音频输入和视频输出，模型被解藕成二个部分: audio2landmark，landmark2image

口型驱动

学术

传统的 Lipsync（唇音同步）方法

语音驱动的说话人视频合成（Audio-driven Talking Face Generation）

1）合成的视频画面应具有较高的保真度；

2）合成的人脸面部表情应与输入的驱动语音保证高度对齐。

1、【浙大】GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis

论文链接：https://arxiv.org/abs/2301.13430

代码链接：https://github.com/yerfor/GeneFace

浙江大学与字节跳动，https://redian.news/wxnews/250671

st1：语音->表情动作（hubBERT->语音表征–3DMM人脸关键点表示）
st2：动作域适应
st3：动作渲染视频

https://redian.news/wxnews/250671

Relate

由浅入深了解Diffusion Model

ENV of SLAM

Posted on 2023-07-09 Edited on 2025-08-06 In SLAM

ENV:

Eigen

1	cmake .. -DEigen3_DIR=/mnt/hgfs/space_1604/in

opencv 4.6

/home/simon/workspace/opencv-3.4.15/build


sudo apt-get install libvtk5-dev

或者

sudo apt-get isntall libvtk6-dev


cmake -DWITH_VTK=ON ..

sudo make

sudo make install

Tablua

Posted on 2023-06-16 Edited on 2025-08-06

,Budget,Income,Expenses,Debt
June,5000,8000,4000,6000
July,3000,1000,4000,3000
Aug,5000,7000,6000,3000
Sep,7000,2000,3000,1000
Oct,6000,5000,4000,2000
Nov,4000,3000,5000,
type: line
title: Monthly Revenue
x.title: Amount
y.title: Month
y.suffix: $