Simon Shi的小站

人工智能,机器学习, 强化学习,大模型,自动驾驶

0%

[TOC]

Uff To TensorRT Engine

This sample uses a UFF ResNet50 Model to create a TensorRT Inference Engine

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
# This sample uses a UFF ResNet50 Model to create a TensorRT Inference Engine
import random
from PIL import Image
import numpy as np

import pycuda.driver as cuda
# This import causes pycuda to automatically manage CUDA context creation and cleanup.
import pycuda.autoinit

import tensorrt as trt

import sys, os
sys.path.insert(1, os.path.join(sys.path[0], ".."))
import common

class ModelData(object):
MODEL_PATH = "resnet50-infer-5.uff"
INPUT_NAME = "input"
INPUT_SHAPE = (3, 224, 224)
OUTPUT_NAME = "GPU_0/tower_0/Softmax"
# We can convert TensorRT data types to numpy types with trt.nptype()
DTYPE = trt.float32

# You can set the logger severity higher to suppress messages (or lower to display more messages).
TRT_LOGGER = trt.Logger(trt.Logger.WARNING)

# Allocate host and device buffers, and create a stream.
def allocate_buffers(engine):
# Determine dimensions and create page-locked memory buffers (i.e. won't be swapped to disk) to hold host inputs/outputs.
h_input = cuda.pagelocked_empty(trt.volume(engine.get_binding_shape(0)), dtype=trt.nptype(ModelData.DTYPE))
h_output = cuda.pagelocked_empty(trt.volume(engine.get_binding_shape(1)), dtype=trt.nptype(ModelData.DTYPE))
# Allocate device memory for inputs and outputs.
d_input = cuda.mem_alloc(h_input.nbytes)
d_output = cuda.mem_alloc(h_output.nbytes)
# Create a stream in which to copy inputs/outputs and run inference.
stream = cuda.Stream()
return h_input, d_input, h_output, d_output, stream

def do_inference(context, h_input, d_input, h_output, d_output, stream):
# Transfer input data to the GPU.
cuda.memcpy_htod_async(d_input, h_input, stream)
# Run inference.
context.execute_async(bindings=[int(d_input), int(d_output)], stream_handle=stream.handle)
# Transfer predictions back from the GPU.
cuda.memcpy_dtoh_async(h_output, d_output, stream)
# Synchronize the stream
stream.synchronize()

# The UFF path is used for TensorFlow models. You can convert a frozen TensorFlow graph to UFF using the included convert-to-uff utility.
def build_engine_uff(model_file):
# You can set the logger severity higher to suppress messages (or lower to display more messages).
with trt.Builder(TRT_LOGGER) as builder, builder.create_network() as network, trt.UffParser() as parser:
# Workspace size is the maximum amount of memory available to the builder while building an engine.
# It should generally be set as high as possible.
builder.max_workspace_size = common.GiB(1)
# We need to manually register the input and output nodes for UFF.
parser.register_input(ModelData.INPUT_NAME, ModelData.INPUT_SHAPE)
parser.register_output(ModelData.OUTPUT_NAME)
# Load the UFF model and parse it in order to populate the TensorRT network.
parser.parse(model_file, network)
# Build and return an engine.
return builder.build_cuda_engine(network)

def load_normalized_test_case(test_image, pagelocked_buffer):
# Converts the input image to a CHW Numpy array
def normalize_image(image):
# Resize, antialias and transpose the image to CHW.
c, h, w = ModelData.INPUT_SHAPE
return np.asarray(image.resize((w, h), Image.ANTIALIAS)).transpose([2, 0, 1]).astype(trt.nptype(ModelData.DTYPE)).ravel()

# Normalize the image and copy to pagelocked memory.
np.copyto(pagelocked_buffer, normalize_image(Image.open(test_image)))
return test_image

def main():
# Set the data path to the directory that contains the trained models and test images for inference.
data_path, data_files = common.find_sample_data(description="Runs a ResNet50 network with a TensorRT inference engine.", subfolder="resnet50", find_files=["binoculars.jpeg", "reflex_camera.jpeg", "tabby_tiger_cat.jpg", ModelData.MODEL_PATH, "class_labels.txt"])
# Get test images, models and labels.
test_images = data_files[0:3]
uff_model_file, labels_file = data_files[3:]
labels = open(labels_file, 'r').read().split('\n')

# Build a TensorRT engine.
with build_engine_uff(uff_model_file) as engine:
# Inference is the same regardless of which parser is used to build the engine, since the model architecture is the same.
# Allocate buffers and create a CUDA stream.
h_input, d_input, h_output, d_output, stream = allocate_buffers(engine)
# Contexts are used to perform inference.
with engine.create_execution_context() as context:
# Load a normalized test case into the host input page-locked buffer.
test_image = random.choice(test_images)
test_case = load_normalized_test_case(test_image, h_input)
# Run the engine. The output will be a 1D tensor of length 1000, where each value represents the
# probability that the image corresponds to that label
do_inference(context, h_input, d_input, h_output, d_output, stream)
# We use the highest probability as our prediction. Its index corresponds to the predicted label.
pred = labels[np.argmax(h_output)]
if "_".join(pred.split()) in os.path.splitext(os.path.basename(test_case))[0]:
print("Correctly recognized " + test_case + " as " + pred)
else:
print("Incorrectly recognized " + test_case + " as " + pred)

if __name__ == '__main__':
main()

2021年, Transformer频频跨界视觉领域

先是图像分类上被谷歌ViT突破,后来目标检测和图像分割又被微软Swin Transformer拿下。

ViT

Google

img

为了大规模扩展视觉模型,该研究将 ViT 架构中的一些密集前馈层 (FFN) 替换为独立 FFN 的稀疏混合(称之为专家)。可学习的路由层为每个独立的 token 选择对应的专家。也就是说,来自同一图像的不同 token 可能会被路由到不同的专家。在总共 E 位专家(E 通常为 32)中,每个 token 最多只能路由到 K(通常为 1 或 2)位专家。这允许扩展模型的大小,同时保持每个 token 计算的恒定。下图更详细地显示了 V-MoE 编码器块的结构。

img

https://new.qq.com/omn/20220116/20220116A03WQ600.html

Swin Transformer

微软

ConvNeXt

Facebook与UC伯克利

该研究制定了一系列设计决策,总结为 1) 宏观设计,2) ResNeXt,3) 反转瓶颈,4) 卷积核大小,以及 5) 各种逐层微设计。

Transformer 中一个重要的设计是创建了反转瓶颈,即 MLP 块的隐藏维度比输入维度宽四倍,如下图 4 所示。

img

EfficientNet v2

image-20220117095019328

Ref:

https://mp.weixin.qq.com/s/c6MRbzQE9ErFUWdWKh8PQA

[toc]

目标定位

— 目标定位和目标检测,通常作为一个整体进行建模。

  • VoxelNet

  • Frustum PointNets

detection

定位任务评估方法:Intersection over Union (IoU)

IoU用来衡量模型最终输出的矩形框或者测试过程中找出的候选区域(Region Proposal)与实际的矩形框(Gound Truth)的差异程度,定义为两者交集和并集的比值。通常我们将这个阈值指定为0.5,即只要模型找出来的矩形框和标签的IoU值大于0.5,就认为成功定位到了目标。

IoU

目标定位的两种思路

**看作回归问题。*对于单个目标的定位,比较简单的思想就是直接看作是关于目标矩形框位置的回归问题,也就是把刻画矩形框位置信息的4个参数作为模型的输出进行训练,采用L2损失函数。对于固定的多个目标定位,也采用类似的方法,只不过输出由4个变成4C个,C为需要定位的目标的类别数。这样,完整的识别定位问题的损失函数由两部分组成:第一部分是用于识别的损失,第二部分是用于定位产生的损失。显然这种方法对于目标数量固定的定位问题比较容易,当数量不定时(比如检测任务)就不适用了。

**滑动窗口法。**这种方法的一个典型代表是overFeat模型,它用不同大小的矩形框依次遍历图片中所有区域,然后在当前区域执行分类和定位任务,即每一个滑过的区域都会输出一个关于目标类别和位置信息的标签,最后再把所有输出的矩形框进行合并,得到一个置信度最高的结果。这种方法其实和我们人的思维很相似,但是这种方法需要用不同尺度的滑动框去遍历整幅图像,计算量是可想而知的。

Section1 Introduction

Section2 Notation and terminology

Section3 MCTS detail

Section4 summarises main variations MCTS

Section5 enhancements to the tree policy,

Section6 enhancements to Simulations, Backpropagations

Section7 key applications(which MCTS has been applied)

Section8 Summaries

[toc]

Read more »

[toc]

Poker Algorithm developer history

论文阅读《Regret minimization in games and the development of champion multiplayer computer poker playing agents》(游戏中的遗憾最小化与多人计算机扑克游戏冠军的发展)

Read more »

[toc]

CFR

类似强化学习的算法

遗憾值(regret) : 在一局石头剪刀布中,对手出了布,玩家出了石头,结果是玩家输了-1。这时的遗憾值为{石头:0,布:1,剪刀:2}。也就意味着如果执行其他动作会比执行当前的动作有多少优势。

遗憾值匹配(regret matching) : 遗憾匹配,通过计算出的遗憾值更新策略。最常用的是将遗憾动作值归一化为生成概率。这种方法可以通过自我对局来最小化预期的regret。

Read more »

[toc]

MCTS_Lee

apply:

  1. GO : UCT (MCTS + UCB)
  2. Pluribus : MCTS-CFR
  3. Dota2 : MCTS
  4. StarCraft2 : MCTS
Read more »