Awesome - Image Classification

Posted on 2019-08-27 Edited on 2025-08-06 In CV , Networks

[TOC]

Conv网络结构，任务　资源汇总

1.4k awesome-image-classification

5.6k awesome-object-detection

8.6k deep_learning_object_detection

21.7k awesome-deep-learning-papers

1.4k imgclsmob - Convolutional neural networks for computer vision

超分辨

2020 Face Super-Resolution Guided by 3D Facial Priors

Awesome - Image Classification

ConvNet	ImageNet top1 acc	ImageNet top5 acc	Published In
VGG	76.3	93.2	ICLR2015
GoogleNet	-	93.33	CVPR2015
PReLU-nets	-	95.06	ICCV2015
ResNet	-	96.43	CVPR2015
Inceptionv3	82.8	96.42	CVPR2016
Inceptionv4	82.3	96.2	AAAI2016
Inception-ResNet-v2	82.4	96.3	AAAI2016
Inceptionv4 + Inception-ResNet-v2	83.5	96.92	AAAI2016
ResNext	-	96.97	CVPR2017
PolyNet	82.64	96.55	CVPR2017
NasNet	82.7	96.2	CVPR2018
MobileNetV2	74.7	-	CVPR2018
PNasNet	82.9	96.2	ECCV2018
AmoebaNet	83.9	96.6	arXiv2018
SENet	-	97.749	CVPR2018

3D CNN

Posted on 2019-08-26 Edited on 2025-08-06 In CV , Networks

[TOC]

3D-CNN Method


iDT
LRCN		CVPR 2015
LSTM composite model
C3D		2015
TSN		ECCV 2016
R3DCNN	NVIDIA	2016
P3D	MSRA	ICCV 2017
R3D/2.5D		2017
T3D		2017
R2+1D		2018

TensorFlow(pb) to TensorRT(uff)

Posted on 2019-08-23 Edited on 2025-08-06 In DNN_platform

[TOC]

Uff To TensorRT Engine

This sample uses a UFF ResNet50 Model to create a TensorRT Inference Engine

# This sample uses a UFF ResNet50 Model to create a TensorRT Inference Engine
import random
from PIL import Image
import numpy as np

import pycuda.driver as cuda
# This import causes pycuda to automatically manage CUDA context creation and cleanup.
import pycuda.autoinit

import tensorrt as trt

import sys, os
sys.path.insert(1, os.path.join(sys.path[0], ".."))
import common

class ModelData(object):
    MODEL_PATH = "resnet50-infer-5.uff"
    INPUT_NAME = "input"
    INPUT_SHAPE = (3, 224, 224)
    OUTPUT_NAME = "GPU_0/tower_0/Softmax"
    # We can convert TensorRT data types to numpy types with trt.nptype()
    DTYPE = trt.float32

# You can set the logger severity higher to suppress messages (or lower to display more messages).
TRT_LOGGER = trt.Logger(trt.Logger.WARNING)

# Allocate host and device buffers, and create a stream.
def allocate_buffers(engine):
    # Determine dimensions and create page-locked memory buffers (i.e. won't be swapped to disk) to hold host inputs/outputs.
    h_input = cuda.pagelocked_empty(trt.volume(engine.get_binding_shape(0)), dtype=trt.nptype(ModelData.DTYPE))
    h_output = cuda.pagelocked_empty(trt.volume(engine.get_binding_shape(1)), dtype=trt.nptype(ModelData.DTYPE))
    # Allocate device memory for inputs and outputs.
    d_input = cuda.mem_alloc(h_input.nbytes)
    d_output = cuda.mem_alloc(h_output.nbytes)
    # Create a stream in which to copy inputs/outputs and run inference.
    stream = cuda.Stream()
    return h_input, d_input, h_output, d_output, stream

def do_inference(context, h_input, d_input, h_output, d_output, stream):
    # Transfer input data to the GPU.
    cuda.memcpy_htod_async(d_input, h_input, stream)
    # Run inference.
    context.execute_async(bindings=[int(d_input), int(d_output)], stream_handle=stream.handle)
    # Transfer predictions back from the GPU.
    cuda.memcpy_dtoh_async(h_output, d_output, stream)
    # Synchronize the stream
    stream.synchronize()

# The UFF path is used for TensorFlow models. You can convert a frozen TensorFlow graph to UFF using the included convert-to-uff utility.
def build_engine_uff(model_file):
    # You can set the logger severity higher to suppress messages (or lower to display more messages).
    with trt.Builder(TRT_LOGGER) as builder, builder.create_network() as network, trt.UffParser() as parser:
        # Workspace size is the maximum amount of memory available to the builder while building an engine.
        # It should generally be set as high as possible.
        builder.max_workspace_size = common.GiB(1)
        # We need to manually register the input and output nodes for UFF.
        parser.register_input(ModelData.INPUT_NAME, ModelData.INPUT_SHAPE)
        parser.register_output(ModelData.OUTPUT_NAME)
        # Load the UFF model and parse it in order to populate the TensorRT network.
        parser.parse(model_file, network)
        # Build and return an engine.
        return builder.build_cuda_engine(network)

def load_normalized_test_case(test_image, pagelocked_buffer):
    # Converts the input image to a CHW Numpy array
    def normalize_image(image):
        # Resize, antialias and transpose the image to CHW.
        c, h, w = ModelData.INPUT_SHAPE
        return np.asarray(image.resize((w, h), Image.ANTIALIAS)).transpose([2, 0, 1]).astype(trt.nptype(ModelData.DTYPE)).ravel()

    # Normalize the image and copy to pagelocked memory.
    np.copyto(pagelocked_buffer, normalize_image(Image.open(test_image)))
    return test_image

def main():
    # Set the data path to the directory that contains the trained models and test images for inference.
    data_path, data_files = common.find_sample_data(description="Runs a ResNet50 network with a TensorRT inference engine.", subfolder="resnet50", find_files=["binoculars.jpeg", "reflex_camera.jpeg", "tabby_tiger_cat.jpg", ModelData.MODEL_PATH, "class_labels.txt"])
    # Get test images, models and labels.
    test_images = data_files[0:3]
    uff_model_file, labels_file = data_files[3:]
    labels = open(labels_file, 'r').read().split('\n')

    # Build a TensorRT engine.
    with build_engine_uff(uff_model_file) as engine:
        # Inference is the same regardless of which parser is used to build the engine, since the model architecture is the same.
        # Allocate buffers and create a CUDA stream.
        h_input, d_input, h_output, d_output, stream = allocate_buffers(engine)
        # Contexts are used to perform inference.
        with engine.create_execution_context() as context:
            # Load a normalized test case into the host input page-locked buffer.
            test_image = random.choice(test_images)
            test_case = load_normalized_test_case(test_image, h_input)
            # Run the engine. The output will be a 1D tensor of length 1000, where each value represents the
            # probability that the image corresponds to that label
            do_inference(context, h_input, d_input, h_output, d_output, stream)
            # We use the highest probability as our prediction. Its index corresponds to the predicted label.
            pred = labels[np.argmax(h_output)]
            if "_".join(pred.split()) in os.path.splitext(os.path.basename(test_case))[0]:
                print("Correctly recognized " + test_case + " as " + pred)
            else:
                print("Incorrectly recognized " + test_case + " as " + pred)

if __name__ == '__main__':
    main()

Paper_CV_1 Image-Classification 图片分类

Posted on 2019-08-21 Edited on 2025-08-06 In CV , BaseWork , Classification

2021年， Transformer频频跨界视觉领域

先是图像分类上被谷歌ViT突破，后来目标检测和图像分割又被微软Swin Transformer拿下。

ViT

Google

为了大规模扩展视觉模型，该研究将 ViT 架构中的一些密集前馈层 (FFN) 替换为独立 FFN 的稀疏混合（称之为专家）。可学习的路由层为每个独立的 token 选择对应的专家。也就是说，来自同一图像的不同 token 可能会被路由到不同的专家。在总共 E 位专家（E 通常为 32）中，每个 token 最多只能路由到 K（通常为 1 或 2）位专家。这允许扩展模型的大小，同时保持每个 token 计算的恒定。下图更详细地显示了 V-MoE 编码器块的结构。

https://new.qq.com/omn/20220116/20220116A03WQ600.html

Swin Transformer

微软

ConvNeXt

Facebook与UC伯克利

该研究制定了一系列设计决策，总结为 1) 宏观设计，2) ResNeXt，3) 反转瓶颈，4) 卷积核大小，以及 5) 各种逐层微设计。

Transformer 中一个重要的设计是创建了反转瓶颈，即 MLP 块的隐藏维度比输入维度宽四倍，如下图 4 所示。

EfficientNet v2

Ref:

https://mp.weixin.qq.com/s/c6MRbzQE9ErFUWdWKh8PQA

Paper_CV_2 object-localization 目标定位

Posted on 2019-08-21 Edited on 2025-08-06 In CV , BaseWork , Object Localization

[toc]

目标定位

— 目标定位和目标检测，通常作为一个整体进行建模。

VoxelNet
Frustum PointNets

定位任务评估方法：Intersection over Union (IoU)

IoU用来衡量模型最终输出的矩形框或者测试过程中找出的候选区域（Region Proposal）与实际的矩形框（Gound Truth）的差异程度，定义为两者交集和并集的比值。通常我们将这个阈值指定为0.5，即只要模型找出来的矩形框和标签的IoU值大于0.5，就认为成功定位到了目标。

目标定位的两种思路

**看作回归问题。*对于单个目标的定位，比较简单的思想就是直接看作是关于目标矩形框位置的回归问题，也就是把刻画矩形框位置信息的4个参数作为模型的输出进行训练，采用L2损失函数。对于固定的多个目标定位，也采用类似的方法，只不过输出由4个变成4C个，C为需要定位的目标的类别数。这样，完整的识别定位问题的损失函数由两部分组成：第一部分是用于识别的损失，第二部分是用于定位产生的损失。显然这种方法对于目标数量固定的定位问题比较容易，当数量不定时（比如检测任务）就不适用了。

**滑动窗口法。**这种方法的一个典型代表是overFeat模型，它用不同大小的矩形框依次遍历图片中所有区域，然后在当前区域执行分类和定位任务，即每一个滑过的区域都会输出一个关于目标类别和位置信息的标签，最后再把所有输出的矩形框进行合并，得到一个置信度最高的结果。这种方法其实和我们人的思维很相似，但是这种方法需要用不同尺度的滑动框去遍历整幅图像，计算量是可想而知的。

【Paper Read】A survey of Monte Carlo Tree Search Methods

Posted on 2019-08-19 Edited on 2025-08-06 In Game

Section1 Introduction

Section2 Notation and terminology

Section3 MCTS detail

Section4 summarises main variations MCTS

Section5 enhancements to the tree policy,

Section6 enhancements to Simulations, Backpropagations

Section7 key applications(which MCTS has been applied)

Section8 Summaries

[toc]

Poker Algorithm developer history

Posted on 2019-08-16 Edited on 2025-08-06 In Game

[toc]

Poker Algorithm developer history

论文阅读《Regret minimization in games and the development of champion multiplayer computer poker playing agents》(游戏中的遗憾最小化与多人计算机扑克游戏冠军的发展)

MCTS-CFR

Posted on 2019-08-15 Edited on 2025-08-06 In AI , Game , RL

[toc]

CFR

类似强化学习的算法

遗憾值（regret） : 在一局石头剪刀布中，对手出了布，玩家出了石头，结果是玩家输了-1。这时的遗憾值为{石头：0，布：1，剪刀：2}。也就意味着如果执行其他动作会比执行当前的动作有多少优势。

遗憾值匹配（regret matching) : 遗憾匹配，通过计算出的遗憾值更新策略。最常用的是将遗憾动作值归一化为生成概率。这种方法可以通过自我对局来最小化预期的regret。

MCTS GO

Posted on 2019-08-15 Edited on 2025-08-06 In AI , Game , RL , Game

[toc]

MCTS_Lee

apply:

GO : UCT (MCTS + UCB)
Pluribus : MCTS-CFR
Dota2 : MCTS
StarCraft2 : MCTS

Hello World

Posted on 2019-08-15 Edited on 2025-08-06 In Hello

Welcome to Hexo! This is your very first post. Check documentation for more info. If you get any problems when using Hexo, you can find the answer in troubleshooting or you can ask me on GitHub.

Conv网络结构，任务 资源汇总

Awesome - Image Classification

3D-CNN Method

Uff To TensorRT Engine

ViT

Swin Transformer

ConvNeXt

EfficientNet v2

Ref:

目标定位

定位任务评估方法：Intersection over Union (IoU)

目标定位的两种思路

Poker Algorithm developer history

CFR

MCTS_Lee

apply:

Conv网络结构，任务　资源汇总