Simon Shi的小站

人工智能,机器学习, 强化学习,大模型,自动驾驶

0%

[TOC]

libtorch_URLs

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
libtorch 1.0.0

# cpu
https://download.pytorch.org/libtorch/cpu/libtorch-shared-with-deps-1.0.0.zip
# cuda
https://download.pytorch.org/libtorch/cu80/libtorch-shared-with-deps-1.0.0.zip
https://download.pytorch.org/libtorch/cu90/libtorch-shared-with-deps-1.0.0.zip


libtorch 1.0.1

# cpu
https://download.pytorch.org/libtorch/cpu/libtorch-shared-with-deps-1.0.1.zip
# cuda
https://download.pytorch.org/libtorch/cu80/libtorch-shared-with-deps-1.0.1.zip
https://download.pytorch.org/libtorch/cu90/libtorch-shared-with-deps-1.0.1.zip
https://download.pytorch.org/libtorch/cu100/libtorch-shared-with-deps-1.0.1.zip



libtorch 1.1.0

# cpu
https://download.pytorch.org/libtorch/cpu/libtorch-shared-with-deps-1.1.0.zip
# cuda
https://download.pytorch.org/libtorch/cu90/libtorch-shared-with-deps-1.1.0.zip
https://download.pytorch.org/libtorch/cu100/libtorch-shared-with-deps-1.1.0.zip


libtorch 1.2.0

# cpu
https://download.pytorch.org/libtorch/cpu/libtorch-shared-with-deps-1.2.0.zip
# cuda
https://download.pytorch.org/libtorch/cu92/libtorch-shared-with-deps-1.2.0.zip
https://download.pytorch.org/libtorch/cu100/libtorch-shared-with-deps-1.2.0.zip



libtorch 1.3.0

从这个版本开始,官方提供了 Pre-cxx11 ABI 和 cxx11 ABI 两种版本
Pre-cxx11 ABI

# cpu
https://download.pytorch.org/libtorch/cpu/libtorch-shared-with-deps-1.3.0%2Bcpu.zip
# cuda
https://download.pytorch.org/libtorch/cu92/libtorch-shared-with-deps-1.3.0%2Bcu92.zip
https://download.pytorch.org/libtorch/cu100/libtorch-shared-with-deps-1.3.0.zip
https://download.pytorch.org/libtorch/cu101/libtorch-shared-with-deps-1.3.0.zip


cxx11 ABI

# cpu
https://download.pytorch.org/libtorch/cpu/libtorch-cxx11-abi-shared-with-deps-1.3.0%2Bcpu.zip
# cuda
https://download.pytorch.org/libtorch/cu92/libtorch-cxx11-abi-shared-with-deps-1.3.0%2Bcu92.zip
https://download.pytorch.org/libtorch/cu100/libtorch-cxx11-abi-shared-with-deps-1.3.0.zip
https://download.pytorch.org/libtorch/cu101/libtorch-cxx11-abi-shared-with-deps-1.3.0.zip


libtorch 1.3.1
Pre-cxx11 ABI

# cpu
https://download.pytorch.org/libtorch/cpu/libtorch-shared-with-deps-1.3.1%2Bcpu.zip
# cuda
https://download.pytorch.org/libtorch/cu92/libtorch-shared-with-deps-1.3.1%2Bcu92.zip
https://download.pytorch.org/libtorch/cu100/libtorch-shared-with-deps-1.3.1%2Bcu100.zip
https://download.pytorch.org/libtorch/cu101/libtorch-shared-with-deps-1.3.1.zip

cxx11 ABI

# cpu
https://download.pytorch.org/libtorch/cpu/libtorch-cxx11-abi-shared-with-deps-1.3.1%2Bcpu.zip
# cuda
https://download.pytorch.org/libtorch/cu92/libtorch-cxx11-abi-shared-with-deps-1.3.1%2Bcu92.zip
https://download.pytorch.org/libtorch/cu100/libtorch-cxx11-abi-shared-with-deps-1.3.1%2Bcu100.zip
https://download.pytorch.org/libtorch/cu101/libtorch-cxx11-abi-shared-with-deps-1.3.1.zip


libtorch 1.4.0
Pre-cxx11 ABI

# cpu
https://download.pytorch.org/libtorch/cpu/libtorch-shared-with-deps-1.4.0%2Bcpu.zip
# cuda
https://download.pytorch.org/libtorch/cu92/libtorch-shared-with-deps-1.4.0%2Bcu92.zip
https://download.pytorch.org/libtorch/cu100/libtorch-shared-with-deps-1.4.0%2Bcu100.zip
https://download.pytorch.org/libtorch/cu101/libtorch-shared-with-deps-1.4.0.zip


cxx11 ABI

# cpu
https://download.pytorch.org/libtorch/cpu/libtorch-cxx11-abi-shared-with-deps-1.4.0%2Bcpu.zip
# cuda
https://download.pytorch.org/libtorch/cu92/libtorch-cxx11-abi-shared-with-deps-1.4.0%2Bcu92.zip
https://download.pytorch.org/libtorch/cu100/libtorch-cxx11-abi-shared-with-deps-1.4.0%2Bcu100.zip
https://download.pytorch.org/libtorch/cu101/libtorch-cxx11-abi-shared-with-deps-1.4.0.zip

libtorch 1.5.0
Pre-cxx11 ABI

# cpu
https://download.pytorch.org/libtorch/cpu/libtorch-shared-with-deps-1.5.0%2Bcpu.zip
# cuda
https://download.pytorch.org/libtorch/cu92/libtorch-shared-with-deps-1.5.0%2Bcu92.zip
https://download.pytorch.org/libtorch/cu101/libtorch-shared-with-deps-1.5.0.zip
https://download.pytorch.org/libtorch/cu102/libtorch-shared-with-deps-1.5.0.zip

cxx11 ABI

# cpu
https://download.pytorch.org/libtorch/cpu/libtorch-cxx11-abi-shared-with-deps-1.5.0%2Bcpu.zip
# cuda
https://download.pytorch.org/libtorch/cu92/libtorch-cxx11-abi-shared-with-deps-1.5.0%2Bcu92.zip
https://download.pytorch.org/libtorch/cu101/libtorch-cxx11-abi-shared-with-deps-1.5.0.zip
https://download.pytorch.org/libtorch/cu102/libtorch-cxx11-abi-shared-with-deps-1.5.0.zip


libtorch 1.5.1
cxx11 ABI
https://download.pytorch.org/libtorch/cpu/libtorch-cxx11-abi-shared-with-deps-1.5.1%2Bcpu.zip
https://download.pytorch.org/libtorch/cu92/libtorch-cxx11-abi-shared-with-deps-1.5.1%2Bcu92.zip
https://download.pytorch.org/libtorch/cu101/libtorch-cxx11-abi-shared-with-deps-1.5.1%2Bcu101.zip
https://download.pytorch.org/libtorch/cu102/libtorch-cxx11-abi-shared-with-deps-1.5.1.zip

libtorch 1.6.0
Pre-cxx11 ABI

# cpu
https://download.pytorch.org/libtorch/cpu/libtorch-shared-with-deps-1.6.0%2Bcpu.zip
# cuda
https://download.pytorch.org/libtorch/cu92/libtorch-shared-with-deps-1.6.0%2Bcu92.zip
https://download.pytorch.org/libtorch/cu101/libtorch-shared-with-deps-1.6.0%2Bcu101.zip
https://download.pytorch.org/libtorch/cu102/libtorch-shared-with-deps-1.6.0.zip


cxx11 ABI

# cpu
https://download.pytorch.org/libtorch/cpu/libtorch-cxx11-abi-shared-with-deps-1.6.0%2Bcpu.zip
# cuda
https://download.pytorch.org/libtorch/cu92/libtorch-cxx11-abi-shared-with-deps-1.6.0%2Bcu92.zip
https://download.pytorch.org/libtorch/cu101/libtorch-cxx11-abi-shared-with-deps-1.6.0%2Bcu101.zip
https://download.pytorch.org/libtorch/cu102/libtorch-cxx11-abi-shared-with-deps-1.6.0.zip


libtorch 1.7.0
Pre-cxx11 ABI

# cpu
https://download.pytorch.org/libtorch/cpu/libtorch-shared-with-deps-1.7.0%2Bcpu.zip
# cuda
https://download.pytorch.org/libtorch/cu92/libtorch-shared-with-deps-1.7.0%2Bcu92.zip
https://download.pytorch.org/libtorch/cu101/libtorch-shared-with-deps-1.7.0%2Bcu101.zip
https://download.pytorch.org/libtorch/cu102/libtorch-shared-with-deps-1.7.0.zip
https://download.pytorch.org/libtorch/cu110/libtorch-shared-with-deps-1.7.0%2Bcu110.zip


cxx11 ABI

# cpu
https://download.pytorch.org/libtorch/cpu/libtorch-cxx11-abi-shared-with-deps-1.7.0%2Bcpu.zip
# cuda
https://download.pytorch.org/libtorch/cu92/libtorch-cxx11-abi-shared-with-deps-1.7.0%2Bcu92.zip
https://download.pytorch.org/libtorch/cu101/libtorch-cxx11-abi-shared-with-deps-1.7.0%2Bcu101.zip
https://download.pytorch.org/libtorch/cu102/libtorch-cxx11-abi-shared-with-deps-1.7.0.zip
https://download.pytorch.org/libtorch/cu110/libtorch-cxx11-abi-shared-with-deps-1.7.0%2Bcu110.zip


libtorch 1.7.1
Pre-cxx11 ABI

# cpu
https://download.pytorch.org/libtorch/cpu/libtorch-shared-with-deps-1.7.1%2Bcpu.zip
# cuda
https://download.pytorch.org/libtorch/cu92/libtorch-shared-with-deps-1.7.1%2Bcu92.zip
https://download.pytorch.org/libtorch/cu101/libtorch-shared-with-deps-1.7.1%2Bcu101.zip
https://download.pytorch.org/libtorch/cu102/libtorch-shared-with-deps-1.7.1.zip
https://download.pytorch.org/libtorch/cu110/libtorch-shared-with-deps-1.7.1%2Bcu110.zip

cxx11 ABI

# cpu
https://download.pytorch.org/libtorch/cpu/libtorch-cxx11-abi-shared-with-deps-1.7.1%2Bcpu.zip
# cuda
https://download.pytorch.org/libtorch/cu92/libtorch-cxx11-abi-shared-with-deps-1.7.1%2Bcu92.zip
https://download.pytorch.org/libtorch/cu101/libtorch-cxx11-abi-shared-with-deps-1.7.1%2Bcu101.zip
https://download.pytorch.org/libtorch/cu102/libtorch-cxx11-abi-shared-with-deps-1.7.1.zip
https://download.pytorch.org/libtorch/cu110/libtorch-cxx11-abi-shared-with-deps-1.7.1%2Bcu110.zip


libtorch 1.8.0
Pre-cxx11 ABI
# cpu
https://download.pytorch.org/libtorch/cpu/libtorch-shared-with-deps-1.8.0%2Bcpu.zip
# cuda
https://download.pytorch.org/libtorch/cu102/libtorch-shared-with-deps-1.8.0.zip
https://download.pytorch.org/libtorch/cu111/libtorch-shared-with-deps-1.8.0%2Bcu111.zip

cxx11 ABI
# cpu
https://download.pytorch.org/libtorch/cpu/libtorch-cxx11-abi-shared-with-deps-1.8.0%2Bcpu.zip
# cuda
https://download.pytorch.org/libtorch/cu102/libtorch-cxx11-abi-shared-with-deps-1.8.0.zip
https://download.pytorch.org/libtorch/cu111/libtorch-cxx11-abi-shared-with-deps-1.8.0%2Bcu111.zip


libtorch 1.8.1 (LTS)
Pre-cxx11 ABI
# cpu
https://download.pytorch.org/libtorch/cpu/libtorch-shared-with-deps-1.8.1%2Bcpu.zip
# cuda
https://download.pytorch.org/libtorch/cu102/libtorch-shared-with-deps-1.8.1%2Bcu102.zip
https://download.pytorch.org/libtorch/cu111/libtorch-shared-with-deps-1.8.1%2Bcu111.zip

cxx11 ABI

# cpu
https://download.pytorch.org/libtorch/cpu/libtorch-cxx11-abi-shared-with-deps-1.8.1%2Bcpu.zip
# cuda
https://download.pytorch.org/libtorch/cu102/libtorch-cxx11-abi-shared-with-deps-1.8.1%2Bcu102.zip
https://download.pytorch.org/libtorch/cu111/libtorch-cxx11-abi-shared-with-deps-1.8.1%2Bcu111.zip


libtorch 1.9.0
Pre-cxx11 ABI
# cpu
https://download.pytorch.org/libtorch/cpu/libtorch-shared-with-deps-1.9.0%2Bcpu.zip
# cuda
https://download.pytorch.org/libtorch/cu102/libtorch-shared-with-deps-1.9.0%2Bcu102.zip
https://download.pytorch.org/libtorch/cu111/libtorch-shared-with-deps-1.9.0%2Bcu111.zip
cxx11 ABI
# cpu
https://download.pytorch.org/libtorch/cpu/libtorch-cxx11-abi-shared-with-deps-1.9.0%2Bcpu.zip
# cuda
https://download.pytorch.org/libtorch/cu102/libtorch-cxx11-abi-shared-with-deps-1.9.0%2Bcu102.zip
https://download.pytorch.org/libtorch/cu111/libtorch-cxx11-abi-shared-with-deps-1.9.0%2Bcu111.zip

Source Build

1
2
git clone --recursive https://github.com/pytorch/pytorch  # --recursive表示下载子模块,但有些模块很难下载下来,可以先用 git clone https://github.com.cnpmjs.org/pytorch/pytorch 
cd pytorch

克隆下来的是最新版本,执行git tag查看分支,然后git checkout branch_name切换到想要的分支,

1
2
3
4
5
git checkout v1.7.1-rc3

# if you are updating an existing checkout
git submodule sync
git submodule update --init --recursive

注意:如果下载比较慢或者报错,可以在pytorch目录下查看.gitmodules文件, 切换分支后先把里面网址替换为github加速插件的地址, 然后再执行git submodule sync 和 后面的命令

1
2
3
4
5
6
7
8
9
10
git config --global --unset http.proxy
git config --global --unset https.proxy

git config --local http.proxy 127.0.0.1:11000
git config --local https.proxy 127.0.0.1:11000

git config --local --unset http.proxy
git config --local --unset https.proxy

unset https.proxy

[TOC]

Offical web

github offical

https://stackoverflow.com/questions/65379070/how-to-use-onnx-model-in-c-code-on-linux

pytorch to onnx to tensorRT

mx2onnx

MXNet model to the ONNX model format

onnx2mx

pytorch2onnx

pb2onnx

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import tf2onnx
from tf2onnx import tf_loader


def convert_ckpt(checkpoint, inputs, outputs, out_path):
graph_def, inputs, outputs = tf_loader.from_checkpoint(checkpoint, inputs, outputs)
model_path = checkpoint

model_proto, external_tensor_storage = tf2onnx.convert.from_graph_def(graph_def,
input_names=inputs,
output_names=outputs,
output_path=out_path
)

print('---1----', model_proto)
print('---2----', external_tensor_storage)


def demo1():
inputs = ['X_in:0']
outputs = ['softmax:0',' out_argmax:0' ,'out_put_k_indices:0']
ck = r'model.ckpt-100000.meta'
out_path = r'test.onnx'
convert_ckpt(ck, inputs, outputs, out_path)

[TOC]

C++模型调用

模型转换思路通常为:

  • Pytorch -> ONNX -> TensorRT
  • Pytorch -> ONNX -> TVM
  • Pytorch -> 转换工具 -> caffe
  • Pytorch -> torchscript(C++版本Torch) [此方式]
  • pytorch-> JIT -> TensorRT

https://pytorch.org/cppdocs/api/library_root.html

https://pytorch.org/tutorials/advanced/cpp_frontend.html

最近所里有一个 GUI 项目需要调用 PyTorch 的模型,虽然之前做过一些,但是大部分用的是 Python 接口,这次对实效性有要求,因此做一个 C++的接口,现在把一些配置事项做个记录。

准备工作

下载安装支持库

首先,需要下载安装LibTorch支持库,推荐使用LibPyTorchLibPyTorch

下载后直接解压

1
2
wget https://download.pytorch.org/libtorch/nightly/cpu/libtorch-shared-with-deps-latest.zip
unzip libtorch-shared-with-deps-latest.zip

基于已训练的 Torch 模型

追踪原始模型

需要注意的是,如果希望加载 PyTorch 库到 C++中,首先需要基于 JIT 库的 TorchScript 对模型进行转化,这里以简单resnet18模型来作为示例,可以简单的使用torchvision中的模型库进行生成,接着我们生成一个简单的假数据,利用torch.jit.trace让 TorchScript 能够遍历一遍模型,便可完成追踪。

1
2
3
4
5
6
7
8
import torch
import torchvision
# 实例模型
model = torchvision.models.resnet18()
# 假数据
example = torch.rand(1, 3, 224, 224)
# 使用JIT遍历模型,从而获得记录
traced_script_module = torch.jit.trace(model, example)

对于可能存在依赖于数据输入条件的情况,如以下模型:

1
2
3
4
5
6
7
8
9
10
11
12
13
import torch

class MyModule(torch.nn.Module):
def __init__(self, N, M):
super(MyModule, self).__init__()
self.weight = torch.nn.Parameter(torch.rand(N, M))

def forward(self, input):
if input.sum() > 0:
output = self.weight.mv(input)
else:
output = self.weight + input
return output

数据的前向传播有赖于输入的值,那么可以调用torch.jit.script直接进行转换:

1
2
my_module = MyModule(10,20)
traced_script_module2 = torch.jit.script(my_module)

区别在于第二种方式实现时可以直接将正在训练的模型调用加载。 在获得上述的traced_script_module后,实际上这是一个序列化的 torch 张量字典,可以直接调用save方法完成保存:

1
2
# 保存使用TorchScript遍历的模型
traced_script_module.save("traced_resnet_model.pt")

加载 Torch 模型

有了保存后的 pt 模型后,在 C++中的调用,即为和 LibTorch 库的交互,这里以官方的例子作说明

新建 C++项目, CMakeList 配置可以参考以下

1
2
3
4
5
6
7
8
cmake_minimum_required(VERSION 3.16)
project(torchcpp)
set(Torch_DIR ./libtorch/share/cmake/Torch) #设置Torch的执行位置

find_package(Torch REQUIRED) # 查找支持库
add_executable(torchcpp main.cpp) # 项目主入口
target_link_libraries(torchcpp "${TORCH_LIBRARIES}") # 指出动态连接库
set(CMAKE_CXX_STANDARD 14) # C++标准

对应简单加载 C++代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
#include <torch/script.h> // One-stop header.
#include <iostream>
#include <memory>

at::Tensor baseModel(std::vector<torch::jit::IValue> inputs, torch::jit::script::Module module) {
at::Tensor output = module.forward(inputs).toTensor();
return output;
}

int main(int argc, const char *argv[]) {
if (argc != 2) {
std::cerr << "usage: example-app <path-to-exported-script-module>\n";
return -1;
}
torch::jit::script::Module module;
try {
// 使用 torch::jit::load() 反序列化原有模型.
module = torch::jit::load(argv[1]);
}
catch (const c10::Error &e) {
std::cerr << "error loading the model\n";
return -1;
}
std::cout << "model loads ok\n";
// 生成假数据以测试
std::vector<torch::jit::IValue> inputs;
inputs.push_back(torch::ones({1, 3, 224, 224}));
at::Tensor output = baseModel(inputs, module);
std::cout << output.slice(1, 0, 5) << "\n";
return 0;
}

同时我们新建一个 build 文件夹以保存编译时文件

至此项目大致结构如下:

1
2
3
├── build
├── CMakeLists.txt
└── main.cpp

进入 build 文件夹执行

1
2
3
(base) ➜  cd build
(base) ➜ cmake ..
(base) ➜ cmake --build . --config Release

可以获得类似输出:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
(base) ➜  build cmake ..
-- The C compiler identification is GNU 9.3.0
-- The CXX compiler identification is GNU 9.3.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Found CUDA: /usr/local/cuda (found version "10.2")
-- Caffe2: CUDA detected: 10.2
-- Caffe2: CUDA nvcc is: /usr/local/cuda/bin/nvcc
-- Caffe2: CUDA toolkit directory: /usr/local/cuda
-- Caffe2: Header version is: 10.2
-- Found CUDNN: /usr/local/cuda/lib64/libcudnn.so
-- Found cuDNN: v8.0.4 (include: /usr/local/cuda/include, library: /usr/local/cuda/lib64/libcudnn.so)
-- Autodetected CUDA architecture(s): 7.5
-- Added CUDA NVCC flags for: -gencode;arch=compute_75,code=sm_75
-- Found Torch: /media/hao/Data/Code/DL/torchcppsample/libtorch/lib/libtorch.so
-- Configuring done
-- Generating done
-- Build files have been written to: /media/hao/Data/Code/DL/torchcppsample/build
(base) ➜ build cmake --build . --config Release
Scanning dependencies of target torchcpp
[ 50%] Building CXX object CMakeFiles/torchcpp.dir/main.cpp.o
[100%] Linking CXX executable torchcpp
[100%] Built target torchcpp

接着前往上级文件夹,执行编译得到的主程序:

1
2
3
4
5
(base) ➜  cd ..
(base) ➜ torchcppsample build/torchcpp Python/traced_resnet_model.pt
model loads ok
0.1439 -0.8914 -0.0475 0.2474 0.3108
[ CPUFloatType{1,5} ]

使用CLion等IDE可以更简单的编译管理,而不需要自行build。

注意事项

注意加载模型时,两者必须在同一设备(Device)中。

基于 C++ 前端训练模型

实际上 C++前端提供了训练模型的接口,但是实施难度不低,相比 Python 训练完成后转 TypeScript 调用,这个方式稍显复杂。 官方提供的教程如下:使用 PyTorch 的 C++前端,后续再更新吧。

参考:

Offical Doc Pytorch cpp_export

zhuhu_C++ 如何调用Pytorch模型

2019-07 Cnblog 使用C++调用并部署pytorch模型

2020-07 CSDN Ubuntu下C++调用pytorch训练好模型–利用libtorch

⭐2019-05 Cnblog 使用C++调用pytorch模型(Linux)

⭐2020-10 使用 C++ 调用 PyTorch 模型

无人驾驶中的动态环境检测-2D检测

[TOC]

2D检测

image-20220412180039632

preview

IDea:

  • 位置:先找到所有的ROI
    • Sliding Window / Slective Search / … | CNN(RPN …)
  • 类别:对每个ROI进行分类提取类别信息
    • HOG/DPM/SIFT/LBP/… | CNN(conv pooling)
    • SVM / Adaboost / … | CNN (softmax ….)
  • 位置修正:Bounding Box Regression
    • Linear Regresion / … | CNN(regression …)

How to Generate ROI

preview

How To Classify ROI

preview

4.1 two-step (基于图片的检测方法)

  • RCNN, SPPnet, Fast-RCNN, Faster-RCNN

Befor CNN

  • 位置:sliding window / region proposal(候选框)

    • 手工特征 + 分类器
    • 位置修正

img

RCNN

  • 位置:Selective Search 提取候选框
  • 类别:CNN提取特征 + SVM分类
    • 每个候选区域都要做一遍卷积,太多重复计算
  • 位置修正:Linear Regression

img

SPPnet

  • 位置:Selective Search 提取候选框
  • 类别:CNN提取特征 + SVM分类
    • 共享卷积,大大降低计算量
    • SPP层,不同尺度的特征–>固定特尺度特征(后接全连接层)
      • 把原始图片中的box区域mapping映射到CNN提取后的feature的一个box
      • 通过金字塔池化,把原本不同大小的box,提取成固定大小的特征
      • 输入到FC层
  • 位置修正:Linear Regression

image-20220417232415966

Fast-RCNN

  • 位置:Selective Search 提取候选框
  • 类别:CNN特征提取 + CNN分类
    • 分类和回归都使用CNN实现,两种损失可以反传以实现联动调参(半end-to-end)
    • SPP层—换成—>ROI pooling: (可能损失精读)加速计算
  • 位置修正:CNN回归

image-20220417232604317

Faster-RCNN

  • 位置:CNN提取候选框
    • RPN:Region Proposal Net
      • feature 点对应的原图感受野框处生成不同ration/scale的anchor box
      • 对anchor box (锚点框) 二分类 + 回归
        • 2k socre 是否有物体
        • 4k coork 回归量,修正位置($\delta{A}$)
  • 类别:CNN特征提取 + CNN分类
  • 位置修正:CNN回归

image-20220417233138483

4.2 one-step

  • YOLO,
  • SSD
  • YOLOv2

YOLO

  • 位置:
    • Faster-RCNN
    • YOLO
      • 全图划分成7x7的网格,每个网格对应2个default box
      • 没有候选框,直接对default box做全分类+回归(box中心坐标的x,y相对于对应的网格归一化到0-1之间,w,h用图像的width和height归一化到0-1之间)
      • FC1—->FC2{1470x1}–reshape->{7x7x30} ————{1x1x30}
  • 类别:CNN提取特征 + CNN分类
  • 优点:实时性
  • 缺点:
    • 准确率不高(不如faster-rcnn);定位精度差(anchor box不够丰富且只能回归修正一次)
    • 小物体差:anchor和scale不够多样。
    • 不规则物体差:anchor的ratio不够多样。

image-20220418014209305

1x1x30的含义:

​ 两个默认框的预测值

​ 4 xywh (坐标预测), 1, 4 xywh(坐标预测), 1, 20(20个分类预测)

image-20220418021347698

SSD

  • 位置:
    • 借鉴RPN的anchor Box机制: feature点对应的原图感受野框处生成不同ratio/scale的default box
    • 没有候选框!直接对default box做全分类+回归
  • 类别:CNN提取特征 + CNN分类
    • 多感受野特征词输出:前面层感受野小适合小物件,后面层感受野大适合大物体。

image-20220418021934942

YOLOv2

  • 更丰富的default box
    • 从数据集统计出default box(k-means);随着k的增大,IOU也增大(高召回率)但是复杂度也在增加,最终选择k=5
  • 更灵活的类别预测
    • 把预测类别的机制从空间位置(cell)中解耦,由default box同时预测类别和坐标,有效解决物体重叠。

image-20220418022547449

YOLOv3

  • 更好的基础网络
    • darknet-19 换成darknet-53
  • 考虑多尺寸
    • 多尺度
    • 多感受野特征层输出
    • 更多default box:K=9,被3个输出平分3*(5+80)=255;
    • 3个box 5(x,y,w,h,confi), 80(coco class)

image-20220418023004836

实战

https://github.com/andylei77/object-detector

[TOC]

环境配置VS Studio

c_cpp_properties.json

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
{
"configurations": [
{
"name": "windows-gcc-x64",
"includePath": [
"${workspaceFolder}/**",
"C:/Users/Simon/.conda/envs/torch_gpu/include/",
"C:/Users/Simon/.conda/envs/torch_gpu/Lib/site-packages/numpy/core/include"
],
"compilerPath": "D:/Tools/Mingw/mingw64/bin/gcc.exe",
"cStandard": "${default}",
"cppStandard": "c++11",
"intelliSenseMode": "windows-gcc-x64",
"compilerArgs": []
}
],
"version": 4
}

tasks.json

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
{
"version": "2.0.0",
"tasks": [
{
"type": "shell",
"label": "C/C++: g++编译前清理",
"command": "rm",
"args": [
"${workspaceRoot}/*.exe"
]
},
{
"label": "build c++Callpython",
"type": "shell",
"command": "g++",
"args": [
"-g", "c++_python.cpp",
"-I", "C:/Users/Simon/.conda/envs/torch_gpu/include/",
"-I", "C:/Users/Simon/.conda/envs/torch_gpu/Lib/site-packages/numpy/core/include",
"-L", "C:/Users/Simon/.conda/envs/torch_gpu/libs/*",
"-o", "c++_python.exe"
],
"group": {
"kind": "build",
"isDefault": true
},
"problemMatcher": [],
"dependsOn":[
"C/C++: g++编译前清理",
]
}
]
}

// g++ -g c++_python.cpp -L ./libs/* -o c++_python.exe
// -fdiagnostics-color=always

0. Test Run

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
#include <iostream>

#include "numpy/arrayobject.h"
#include "Python.h"

using namespace std;

// ref: https://wenku.baidu.com/view/01fab1346f175f0e7cd184254b35eefdc8d315cd.html

int demo0(){

Py_SetPythonHome(L"C:/Users/Simon/.conda/envs/torch_gpu");
Py_Initialize();
PyRun_SimpleString("print('hello')");
Py_Finalize();

}

1. C++传参

call python method, set parameter ,return value

常用的有两种方法:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
使用 PyTuple_New 创建元组, PyTuple_SetItem 设置元组值

PyObject* args = PyTuple_New(3);
PyObject* arg1 = Py_BuildValue("i", 100); // 整数参数
PyObject* arg2 = Py_BuildValue("f", 3.14); // 浮点数参数
PyObject* arg3 = Py_BuildValue("s", "hello"); // 字符串参数
PyTuple_SetItem(args, 0, arg1);
PyTuple_SetItem(args, 1, arg2);
PyTuple_SetItem(args, 2, arg3);

# 直接使用Py_BuildValue构造元组

PyObject* args = Py_BuildValue("(ifs)", 100, 3.14, "hello");
PyObject* args = Py_BuildValue("()"); // 无参函数
原文链接:https://blog.csdn.net/tobacco5648/article/details/50890106

img

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
int demo1()
{
Py_SetPythonHome(L"C:/Users/Simon/.conda/envs/torch_gpu");
Py_Initialize();

PyObject *module = NULL;
PyObject *pFunc = NULL;
PyObject *pArg = NULL;
PyObject *value = NULL;
// 导入文件
module = PyImport_ImportModule("pthonnx_ru");

pFunc = PyObject_GetAttrString(module, "demo1"); // 找到函数地址
pArg = Py_BuildValue("(S)", "my is c++ test");
value = PyEval_CallObject(pFunc, pArg); // 调用函数
float val;
PyArg_Parse(value, "f", &val);
// PyArg_ParseTuple(value, "f", &val);
cout << "--val1--" << val << endl;

pFunc = PyObject_GetAttrString(module, "Add");
pArg = Py_BuildValue("i, i)", 21, 23);
value = PyEval_CallObject(pFunc, pArg);
PyArg_Parse(value, "f", &val);
cout << "--val2--" << val << endl;

PyRun_SimpleString("print('hello')");
Py_Finalize();

return 0;
}

2.传递List参数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
int demo2()
{

cout << "Hello World" << endl;

Py_SetPythonHome(L"C:/Users/Simon/.conda/envs/torch_gpu");
Py_Initialize();

PyObject *module = NULL;
PyObject *pFunc = NULL;
PyObject *pArg = NULL;
// 导入文件
module = PyImport_ImportModule("pthonnx_ru");
// 找到函数地址
pFunc = PyObject_GetAttrString(module, "demo2");
// Set Parameter
PyObject *pyParams = PyList_New(0);
PyList_Append(pyParams, Py_BuildValue("i", 5)); // float
PyList_Append(pyParams, Py_BuildValue("i", 3));
PyObject *args = PyTuple_New(1);
PyTuple_SetItem(args, 0, pyParams);

// 调用函数
float val;
PyObject *value = PyEval_CallObject(pFunc, args);
PyArg_Parse(value, "f", &val);
// PyArg_ParseTuple(value, "f", &val);
cout << "--val--" << val << endl;

PyRun_SimpleString("print('hello')");
Py_Finalize();

return 0;
}

3 python类操作,类属性,类成员函数 Todo

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
int demo3()
{
cout << "Hello World" << endl;

Py_SetPythonHome(L"C:/Users/Simon/.conda/envs/torch_gpu");
Py_Initialize();

PyObject *module = NULL;
PyObject *pFunc = NULL;

module = PyImport_ImportModule("pthonnx_ru"); // 导入文件
pFunc = PyObject_GetAttrString(module, "demo2"); // 找到函数地址

// 创建参数
PyObject *pArgs = PyTuple_New(1);
// 设置函数参数的值
int InParm = 1;
PyTuple_SetItem(pArgs, 0, PyLong_FromLong(InParm));

// 调用函数
float val;
PyObject *value = PyEval_CallObject(pFunc, pArgs);
PyArg_Parse(value, "f", &val);
// PyArg_ParseTuple(value, "f", &val);
cout << "--val--" << val << endl;

Py_Finalize();
return 0;
}

4 传递c++数组转python的list

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
int demo4(){
Py_SetPythonHome(L"C:/Users/Simon/.conda/envs/torch_gpu");
Py_Initialize();
// #include "numpy/arrayobject.h"
// api doc : https://numpy.org/doc/1.17/reference/c-api.array.html#importing-the-api
import_array();

PyObject *module = NULL;
PyObject *pFunc = NULL;
PyObject *pArgs = NULL;

module = PyImport_ImportModule("pthonnx_ru");
pFunc = PyObject_GetAttrString(module, "demo4");

float buf[2][3];
buf[0][0] = 0;
buf[0][1] = 1.1230;
buf[0][2] = 2.340;
buf[1][0] = 4.540;
buf[1][1] = 6.900;
buf[1][2] = 8.090;

pArgs = PyTuple_New(1);
npy_intp dims[2] = {2, 3}; // 定义list的shape
int ND = 2; // list 维度
PyObject * pPyArray = PyArray_SimpleNewFromData(ND, dims, NPY_FLOAT, buf); // list dim,shape, type, buffer
PyTuple_SetItem(pArgs, 0, pPyArray); // 变量转换
PyEval_CallObject(pFunc, pArgs);

Py_Finalize();
return 0;
}

参考:

https://docs.python.org/2/extending/embedding.html

https://wenku.baidu.com/view/01fab1346f175f0e7cd184254b35eefdc8d315cd.html

https://numpy.org/doc/1.17/reference/c-api.array.html

https://numpy.org/doc/1.17/reference/c-api.array.html#importing-the-api

无人驾驶感知基础–车道线检测

[toc]

3.1无人驾驶感知系统概述

Preception

img

img

Content

  • 实战基于传统方法的车道线检测
  • 图片分割算法综述
  • 实战基于深度学习的图片分割算法综述

3.2 实战分割基于传统方法的车道线检测

静态环境感知与分割算法

https://github.com/andylei77/lane-detector

对比算法:

Zhihu算法集锦(8)|自动驾驶|车道检测实用算法

1
2
3
4
5
6
# 该算法利用了OpenCV库和Udacity自动驾驶汽车数据库的相关内容。
摄像头校准,以移除镜头畸变(Lens distortion)的影响
图像前处理,用于识别车道线
道路视角变换(Perspective transform)
车道线检测
车辆定位和车道半径计算

3.2.2 canny边缘检测

image-20220412102729106

1
2
3
4
5
6
7
8
def do_canny(frame):
# Converts frame to grayscale because we only need the luminance channel for detecting edges - less computationally expensive
gray = cv.cvtColor(frame, cv.COLOR_RGB2GRAY)
# Applies a 5x5 gaussian blur with deviation of 0 to frame - not mandatory since Canny will do this for us
blur = cv.GaussianBlur(gray, (5, 5), 0)
# Applies Canny edge detector with minVal of 50 and maxVal of 150
canny = cv.Canny(blur, 50, 150)
return canny

3.2.3 手动分割路面区域

image-20220412103304006

CV坐标系

  • polygones=[] # 手动指定三角形的三个点
  • mask = zeros_like() 生成mask
  • fillpoly(mask, polygones, 255) ; poly 范围内填充255,区域外保留原始值
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
def do_segment(frame):
# Since an image is a multi-directional array containing the relative intensities of each pixel in the image, we can use frame.shape to return a tuple: [number of rows, number of columns, number of channels] of the dimensions of the frame
# frame.shape[0] give us the number of rows of pixels the frame has. Since height begins from 0 at the top, the y-coordinate of the bottom of the frame is its height
height = frame.shape[0]
# Creates a triangular polygon for the mask defined by three (x, y) coordinates
polygons = np.array([
[(0, height), (800, height), (380, 290)]
])
# Creates an image filled with zero intensities with the same dimensions as the frame
mask = np.zeros_like(frame)
# Allows the mask to be filled with values of 1 and the other areas to be filled with values of 0
cv.fillPoly(mask, polygons, 255)
# A bitwise and operation between the mask and frame keeps only the triangular area of the frame
segment = cv.bitwise_and(frame, mask)
return segment

3.2.4 霍夫变换得到车道线

  • 霍夫变换

    • 参数和变量互换
  • 图像中的一条线,变换到霍夫空间,就变成一个(霍夫空间的)点

  • 图像中的一个点(有多条线穿过),对应霍夫空间的一条线;

cartesian: 笛卡尔坐标系

image-20220412103416004

  • 将笛卡尔坐标系中一系列的可能的点(连接成线),投影到霍夫空间(应该是一个点)

另一种霍夫空间(极坐标)

  • 极坐标法表示直线

image-20220412103957969

HoughLinesP函数在HoughLines的基础上末尾加了一个代表Probabilistic(概率)的P,表明它可以采用累计概率霍夫变换(PPHT)来找出二值图像中的直线。

1
hough = cv.HoughLinesP(segment, 2, np.pi / 180, 100, np.array([]), minLineLength = 100, maxLineGap = 50)

3.2.5 获取车道线并叠加到原始图像中

  • 综合所有线,求两条车道线的平均斜率和截距
1
2
3
4
5
6
7
8
9
10
11
12
13
14
def calculate_lines(frame, lines):
return [left_line, right_line]

def calculate_coordinates(frame, parameters):
slope, intercept = parameters
# Sets initial y-coordinate as height from top down (bottom of the frame)
y1 = frame.shape[0]
# Sets final y-coordinate as 150 above the bottom of the frame
y2 = int(y1 - 150)
# Sets initial x-coordinate as (y1 - b) / m since y1 = mx1 + b
x1 = int((y1 - intercept) / slope)
# Sets final x-coordinate as (y2 - b) / m since y2 = mx2 + b
x2 = int((y2 - intercept) / slope)
return np.array([x1, y1, x2, y2])

3.3 实战基于深度学习的图片分割算法综述

3.3.1 代表算法讲解

img

img

全连接

下采样上采样,池化

融合,上采样的和下采样平行的融合

语义地图,比如输入RGB3通道,最后输出是6个分类,那就是6张图

img

img

img

img

img

img

img

3.3.2 基于图片分割的车道线检测

image-20220412171351872

image-20220412171257608

  • 语义分割,二值分类,得到车道
  • 分类车道-同向,—聚类方法 通向车道在n-dim聚类中,距离很近

源码

Github lanenet-lane-detection

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
# https://github.com/andylei77/lanenet-lane-detection/blob/master/tools/test_lanenet.py

def test_lanenet(image_path, weights_path, use_gpu):
"""
:param image_path:
:param weights_path:
:param use_gpu:
:return:
"""
assert ops.exists(image_path), '{:s} not exist'.format(image_path)

log.info('开始读取图像数据并进行预处理')
t_start = time.time()
image = cv2.imread(image_path, cv2.IMREAD_COLOR)
image_vis = image
image = cv2.resize(image, (512, 256), interpolation=cv2.INTER_LINEAR)
image = image - VGG_MEAN
log.info('图像读取完毕, 耗时: {:.5f}s'.format(time.time() - t_start))

input_tensor = tf.placeholder(dtype=tf.float32, shape=[1, 256, 512, 3], name='input_tensor')
phase_tensor = tf.constant('test', tf.string)

net = lanenet_merge_model.LaneNet(phase=phase_tensor, net_flag='vgg')
binary_seg_ret, instance_seg_ret = net.inference(input_tensor=input_tensor, name='lanenet_model')

cluster = lanenet_cluster.LaneNetCluster()
postprocessor = lanenet_postprocess.LaneNetPoseProcessor()

saver = tf.train.Saver()

# Set sess configuration
if use_gpu:
sess_config = tf.ConfigProto(device_count={'GPU': 1})
else:
sess_config = tf.ConfigProto(device_count={'CPU': 0})
sess_config.gpu_options.per_process_gpu_memory_fraction = CFG.TEST.GPU_MEMORY_FRACTION
sess_config.gpu_options.allow_growth = CFG.TRAIN.TF_ALLOW_GROWTH
sess_config.gpu_options.allocator_type = 'BFC'

sess = tf.Session(config=sess_config)

with sess.as_default():

saver.restore(sess=sess, save_path=weights_path)

t_start = time.time()
binary_seg_image, instance_seg_image = sess.run([binary_seg_ret, instance_seg_ret],
feed_dict={input_tensor: [image]})
t_cost = time.time() - t_start
log.info('单张图像车道线预测耗时: {:.5f}s'.format(t_cost))

binary_seg_image[0] = postprocessor.postprocess(binary_seg_image[0])
mask_image = cluster.get_lane_mask(binary_seg_ret=binary_seg_image[0],
instance_seg_ret=instance_seg_image[0])

for i in range(4):
instance_seg_image[0][:, :, i] = minmax_scale(instance_seg_image[0][:, :, i])
embedding_image = np.array(instance_seg_image[0], np.uint8)

plt.figure('mask_image')
plt.imshow(mask_image[:, :, (2, 1, 0)])
plt.figure('src_image')
plt.imshow(image_vis[:, :, (2, 1, 0)])
plt.figure('instance_image')
plt.imshow(embedding_image[:, :, (2, 1, 0)])
plt.figure('binary_image')
plt.imshow(binary_seg_image[0] * 255, cmap='gray')
plt.show()

sess.close()

return

相关算法汇总

Zhihu 自动驾驶中的车道线检测算法汇总

[TOC]

1.1 行业概述

img

img

preview

preview

preview

preview

1.2 技术路径

preview

preview

L2级别无人驾驶

preview

L3级别无人驾驶

preview

L4级别无人驾驶

preview

preview

V2X

img

preview

preview

1.3 技术概述

img

img

img

img

硬件概述

img

img

软件概述

img

img

操作系统OS

img

HD MAP

img

软件概述

  • 定位
  • 感知
  • 决策
  • 控制
定位

img

img

感知

img

img

决策

img

控制

img

[TOC]

转自–目标跟踪算法综述

第一部分:目标跟踪速览

先跟几个SOTA的tracker混个脸熟,大概了解一下目标跟踪这个方向都有些什么。一切要从2013年的那个数据库说起。。如果你问别人近几年有什么比较niubility的跟踪算法,大部分人都会扔给你吴毅老师的论文,OTB50和OTB100(OTB50这里指OTB-2013,OTB100这里指OTB-2015,50和100分别代表视频数量,方便记忆):

Wu Y, Lim J, Yang M H. Online object tracking: A benchmark [C]// CVPR, 2013.

Wu Y, Lim J, Yang M H. Object tracking benchmark [J]. TPAMI, 2015.

顶会转顶刊的顶级待遇,在加上引用量1480+320多,影响力不言而喻,已经是做tracking必须跑的数据库了,测试代码和序列都可以下载: Visual Tracker Benchmark,OTB50包括50个序列,都经过人工标注:

目标跟踪

  • 视觉和激光方向

SORT

https://github.com/abewley/sort

DeepSORT

ADAS

Ubuntu启动时间转时间戳

1
2
3
4
5
6
7
8
9
10
root@ubuntu:/home/user# cat ../../tools_data.sh
#!/bin/bash
if [ $# -ne 1 ];then
echo "input an dmesg time"
exit 1
fi

unix_time=`echo "$(date +%s) - $(cat /proc/uptime | cut -f 1 -d' ') + ${1}" | bc`
echo ${unix_time}
date -d "@${unix_time}" '+%Y-%m-%d %H:%M:%S'

dmesg

1
2
3
# dmesg
dmesg
dmesg -T

常用日志目录代表的意思

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
=> /var/log/messages:常规日志消息
=> /var/log/boot:系统启动日志
=> /var/log/debug:调试日志消息
=> /var/log/auth.log:用户登录和身份验证日志
=> /var/log/daemon.log:运行squid,ntpd等其他日志消息到这个文件
=> /var/log/dmesg:Linux内核环缓存日志
=> /var/log/dpkg.log:所有二进制包日志都包括程序包安装和其他信息
=> /var/log/faillog:用户登录日志文件失败
=> /var/log/kern.log:内核日志文件
=> /var/log/lpr.log:打印机日志文件
=> /var/log/mail.*:所有邮件服务器消息日志文件
=> /var/log/mysql.*:MySQL服务器日志文件
=> /var/log/user.log:所有用户级日志
=> /var/log/xorg.0.log:X.org日志文件
=> /var/log/apache2/*:Apache Web服务器日志文件目录
=> /var/log/lighttpd/*:Lighttpd Web服务器日志文件目录
=> /var/log/fsck/*:fsck命令日志
=> /var/log/apport.log:应用程序崩溃报告/日志文件=> /var/log/syslog:系统日志=> /var/log/ufw:ufw防火墙日志=> /var/log/gufw:gufw防火墙日志

[TOC]

开源算法库

OpenSpiel 框架 DeepMind
SpriteWorld & Bsuite 框架 DeepMind
Acme 分布式强化学习算法框架 DeepMind
PPO facebook-OpenAI
gym 框架工具包 facebook-OpenAI
Baselines 框架,Demo facebook-OpenAI

游戏平台

RLCard

Atari

RL 算法

Value-Base Policy Gradient AC
TD3
DQN Y
AC
A2C Y
A3C Y
REINFORCE Y
DDPG Y Y
TRPG Y
PPO on-policy Y
SAC off-policy
IMPALA Y

分布式强化学习

  1. 分布式强化学习(Distributed Reinforcement Learning):分布式算法,如IMPALA(Importance Weighted Actor-Learner Architecture)和R2D2(Recurrent Replay Distributed DQN),是近年来的重要发展。这些算法允许大规模分布式训练和数据并行化,从而提高了学习效率和可扩展性。

DQN

  • 两个神经网络,一个延迟更新权重,一个实时训练中进行参数更新。

我们从公式中也能看出,DQN不能用于连续控制问题原因,是因为maxQ(s’,a’)函数只能处理离散型的。那怎么办?

我们知道DQN用magic函数,也就是神经网络解决了Qlearning不能解决的连续状态空间问题。那我们同样的DDPG就是用magic解决DQN不能解决的连续控制型问题就好了。

也就是说,用一个magic函数,直接替代maxQ(s’,a’)的功能。也就是说,我们期待我们输入状态s,magic函数返回我们动作action的取值,这个取值能够让q值最大。这个就是DDPG中的Actor的功能。

[理论篇]怎样直观理解Qlearning算法?zhihu

# DQN(Double/ Duel/ D3DQN)bilibili

深度强化学习——DQN算法原理-CSDN博客

深度 Q 网络(deep Q network,DQN)原理&实现 - 缙云山车神 - 博客园

Noisy DQN

1
2
3
4
fc1 = relu(fc(X))
fc2 = relu(fc(fc1))
fc3 = relu(fc(fc2))
y = noisyLinear(fc3)

Double DQN

Q学习是基于贪心策略的,这会导致最大化偏差问题,和双Q学习思想一致。下面是双Q学习的伪代码,可以借鉴一下。

Dueling DQN

对偶网络(duel network)

D3QN

D3QN(Dueling Double Deep Q Network)

/todo

深度强化学习-D3QN算法原理与代码-CSDN博客

Rainbow

  • Double Q-learning
  • Prioritized replay
  • Dueling networks
  • Multi-step learning
  • Distributional RL
  • Noisy Nets

集合了在此之前的六大卓有成效的DQN变体,将其训练技巧有机的组合到一起

Policy Gradient

有两个缺陷:方差大,离线学习

# 强化学习从零到RLHF(五)Actor-Critic,A2C,A3C zhihu

# 强化学习基础 Ⅷ: Vanilla Policy Gradient 策略梯度原理与实战 zhihu

# 如何理解策略梯度(Policy Gradient)算法?(附代码及代码解释)zhihu

Reinforce(MC-PG)

AC

(解决高方差问题)

策略梯度的Gt(轨迹t时刻的实际后续累计回报,变成了t时刻采取动作a的期望后续累计回报)=等效于Qt(a,s) ; Q指动作值函数;

需要维护两套可训练参数 $\theta$ 、$w$ :

  • actor,$\theta$ 控制策略

  • Critic, w评估动作,输出Q value 用于策略梯度的计算。

# 理解Actor-Critic的关键是什么?(附代码及代码分析) 知乎

A2C (引入优势函数 Advantage Actor-Critic)

我们也可以使用优势函数作为Critic来进一步稳定学习,实际上A2C才是Actor-Critic 架构更多被使用的做法。

这个想法是,优势函数计算一个操作与某个状态下可能的其他操作相比的相对优势:与状态的平均值相比,在某个状态执行该操作如何更好。它从状态-动作对中减去状态的期望值。

换句话说,此函数计算我们在该状态下执行此操作时获得的额外奖励,与在该状态获得的期望奖励相比。

额外的奖励是超出该状态的预期值。

我们的actor损失函数为 

  • 如果  A(s,a)> 0:我们的梯度被推向那个方向。
  • 如果  A(s,a)< 0:我们的梯度被推向相反的方向。

A3C zhihu

A3C全称为Asynchronous advantage actor-critic。
前文讲到,神经网络训练时,需要的数据是独立同分布的,为了打破数据之间的相关性,DQN等方法都采用了经验回放的技巧。然而经验回放需要大量的内存,打破数据的相关性,经验回放并非是唯一的方法。另外一种是异步的方法,所谓异步的方法是指数据并非同时产生,A3C的方法便是其中表现非常优异的异步强化学习算法。
A3C模型如下图所示,每个Worker直接从Global Network中拿参数,自己与环境互动输出行为。利用每个Worker的梯度,对Global Network的参数进行更新。每一个Worker都是一个A2C。

SAC (soft Actor-Critic)

 SAC即Soft Actor-Critic(柔性致动/评价),它是一种基于off-policy和最大熵的深度强化学习算法,其由伯克利和谷歌大脑的研究人员提出。

SAC算法是强化学习中的一种off-policy算法,全称为Soft Actor-Critic,它属于最大熵强化学习范畴。

SAC算法的网络结构类似于TD3算法,都有一个Actor网络和两个Critic网络,但SAC算法的目标网络只有两个Critic网络,没有Actor网络。

SAC算法解决的问题是离散动作空间和连续动作空间的强化学习问题,它学习一个随机性策略,在不少标准环境中取得了领先的成绩,是一个非常高效的算法。

在SAC算法中,每次用Critic网络时会挑选一个值小的网络,从而缓解值过高估计的问题,进而提高算法的稳定性和收敛速度。

DDPG

deep deterministic policy gradient,深度确定性策略梯度算法。

  • PPO输出的是一个策略,也就是一个概率分布,而DDPG输出的直接是一个动作。

【Zhihu】一文带你理清DDPG算法(附代码及代码解释)

Deep Deterministic Policy Gradient (DDPG) | 莫烦Python

TF DDPG_update2.py

Pytorch实现DDPG算法_ddpg pytorch-CSDN博客

TD3

TD3算法主要解决了DDPG算法的高估问题。在DDPG算法的基础上,TD3算法提出了三个关键技术:

  1. 双重网络(Double network):采用两套Critic网络,计算目标值时取二者中的较小值,从而抑制网络过估计问题。
  2. 目标策略平滑正则化(Target policy smoothing regularization):计算目标值时,在下一个状态的动作上加入扰动,从而使得价值评估更准确。
  3. 延迟更新(Delayed update):Critic网络更新多次后,再更新Actor网络,从而保证Actor网络的训练更加稳定。

TD3算法在许多连续控制任务上都取得了不错的表现。

【附代码】大白话讲TD3算法 zhihu

TD3算法(Twin Delayed Deep Deterministic policy gradient)-CSDN博客

TRPO 置信域策略优化算法

强化学习 TRPO, PPO,DPPO

PPO(Proximal Policy Optimization)

        TRPO优化效率上一个改进,其通过修改TRPO算法,使其可以使用SGD算法来做置信域更新,并且用clipping的方法方法来限制策略的过大更新,保证优化在置信域中进行。

PPO 算法利用新策略和旧策略的比例,从而限制了新策略的更新幅度。

PPO-Max

https://blog.csdn.net/jinzhuojun/article/details/80417179

PPO算法是一种用于强化学习的策略优化算法,全称为Proximal Policy Optimization。

PPO算法基于策略梯度方法,通过约束优化的方式来保证每次迭代的更新幅度不会过大,从而提高算法的稳定性和收敛速度。

PPO算法通过两个不同的目标函数来更新策略函数,分别是Clipped Surrogate Objective和Trust Region Policy Optimization。其中,PPO-Penalty类似于TRPO算法,将KL散度作为目标函数的一个惩罚项,并自动调整惩罚系数,使其适应数据的规模;而PPO-Clip则没有KL散度项,也没有约束条件,使用一种特殊的裁剪技术,在目标函数中消除了新策略远离旧策略的动机。

PPO算法还使用了Generalized Advantage Estimation(GAE)的技术来估计策略函数的价值函数,从而提高了算法的性能和收敛速度。

PPO算法的应用范围非常广泛,可以用于各种强化学习任务,如机器人控制、游戏玩法、自然语言处理等方面。在OpenAl的研究中,PPO算法被用于训练人工智能玩Atari游戏,以及AlphaGo Zero等强化学习任务中,取得了优秀的表现。

总的来说,PPO算法是一种稳定、高效的强化学习算法,具有广泛的应用价值。

PPO算法实现gym连续动作空间任务Pendulum(pytorch)

Python强化练习之PyTorch opp算法实现月球登陆器(得分观察)

影响PPO算法性能的10个关键技巧(附PPO算法简洁Pytorch实现

PPO算法的37个Implementation细节

【深度强化学习】(6) PPO 模型解析,附Pytorch完整代码[【运行过】

Coding PPO from Scratch with PyTorch (Part 1/4)[专业详细]

深度增强学习PPO(Proximal Policy Optimization)算法源码走读_ppo算法-CSDN博客

强化学习(9):TRPO、PPO以及DPPO算法-CSDN博客

强化学习笔记(1)- PPO的前世今生

DPPO(多进程PPO)

Rainbow

组成Rainbow的这六大变体如下:

  • Double Q-learning
  • Prioritized replay
  • Dueling networks
  • Multi-step learning
  • Distributional RL
  • Noisy Nets

Apex

​ soft actor-critic

zhihu-清华博士【强化学习算法 11】SAC

反向强化学习(IRL)

模仿学习

在经典的强化学习中,智能体通过与环境交互和最大化reward期望来学习策略。

在模仿学习中没有显式的reward,因而只能从专家示例中学习。

GAIL

# 模仿学习GAIL框架与pytorch实现

**GAIL的核心思想:**策略生成器G和判别器D的一代代博弈

策略生成器:策略网络,以state为输入,以action为输出

判别器:二分类网络,将策略网络生成的 (s, a) pair对为负样本,专家的(s,a)为正样本

learn 判别器D:

给定G,在与环境交互中通过G生成完整或不完整的episode(但过程中G要保持不变)作为负样本,专家样本作为正样本来训练D

learn 生成器G:

给定D,通过常规的强化学习算法(如PPO)来学习策略网络,其中reward通过D得出,即该样本与专家样本的相似程度

G和D的训练过程交替进行,这个对抗的过程使得G生成的策略在与环境的交互中得到的reward越来越大,D“打假”的能力也越来越强。

AIRL

Learning Robust Rewards with Adversarial Inverse Reinforcement learning

RL Apply

info
DouZero
DanZero Distribute Q-learning
MuZero

DeepMind

AlphaZero

启发式搜索(MCTS)+强化学习+自博弈的方法,

MuZero # model based专题三–MuZero系列

Muzero的贡献在AlphaZero强大的搜索和策略迭代算法的基础上加入了模型学习的过程,使其能够在不了解状态转移规则的情况下,达到了当时的SOTA效果。

Muzero的模型有三部分

  • representation:表征编码,使用历史观测序列编码为隐空间的

  • dynamics:动态模型,这个就是MBRL经典的Dynamic Model

  • prediction:值模型。输入输出策略和价值函数

# MuZero及核心伪码分析

EfficientZero detail

接下来是NIPS2021的EfficientZero,这篇文章强调的是sample-efficiency,使用limited data,在仅有两小时实时游戏经验的情况下,在Atari 100K基准上实现了190.4%的平均人类性能和116%的中位数人类性能,并且在DMC Control 100K基准超过了state SAC(oracle),性能接近2亿帧的DQN,而消耗的数据少500倍。

EfficientZero基于MuZero,做了如下三点改进:

(1)使用自监督的方式来学习temporally consistent environment model

(2)端到端的学习value prefix,预测时间段内奖励值之和,降低预测reward不准导致的误差

(3)改变Multi-step reward的算法,使用一个自适应的展开长度来纠正off-policy target

SpriteWorld & Bsuite (DeepMind)

https://blog.csdn.net/weixin_31351409/article/details/101189820

https://github.com/deepmind/spriteworld

https://github.com/deepmind/bsuite

Acme

https://www.sohu.com/a/400058213_473283

https://github.com/deepmind/acme

https://arxiv.org/pdf/2006.00979v1.pdf

https://www.deepmind.com/research?tag=Reinforcement+learning

OpenSpiel

https://zhuanlan.zhihu.com/p/80526746

极小化极大(Alpha-beta剪枝)搜索、蒙特卡洛树搜索、序列形式线性规划、虚拟遗憾最小化(CFR)、Exploitability
外部抽样蒙特卡洛CFR、结果抽样蒙特卡洛CFR、Q-learning、价值迭代、优势动作评论算法(Advantage Actor Critic,A2C)、Deep Q-networks (DQN)
短期价值调整(EVA)、Deep CFR、Exploitability 下降(ED) 、(扩展形式)虚拟博弈(XFP)、神经虚拟自博弈(NFSP)、Neural Replicator Dynamics(NeuRD)
遗憾策略梯度(RPG, RMPG)、策略空间回应oracle(PSRO)、基于Q的所有行动策略梯度(QPG)、回归CFR (RCFR)、PSROrN、α-Rank、复制/演化动力学。

OpenAI

https://blog.csdn.net/kittyzc/article/details/83006403

Baselines

https://github.com/openai/baselines

  • A2C
  • ACER
  • ACKTR
  • DDPG
  • DQN
  • GAIL
  • HER
  • PPO1
  • PPO2
  • TRPO

Spinning Up

spinning up是一个深度强化学习的很好的资源

https://spinningup.openai.com/en/latest/

根据官方文档,spinning up实现的算法包括:

Vanilla Policy Gradient (VPG)
Trust Region Policy Optimization (TRPO)
Proximal Policy Optimization (PPO)
Deep Deterministic Policy Gradient (DDPG)
Twin Delayed DDPG (TD3)
Soft Actor-Critic (SAC)

学习路线

⭐⭐⭐增强学习-第二版-中文

上海交大ACM班俞勇团队推出强化学习入门宝典!附作者对话

**张伟楠:**我在上海交通大学给致远学院ACM班和电院AI试点班的同学讲授强化学习,由于学生的专业和本课程内容很贴合,因此学生对强化学习的原理部分关注较多。在夏令营中获得学生的反馈更多来自如何在各种各样的领域用好强化学习技术,当然也有不少本专业的学生对强化学习本身的研究十分了解。对于来我们APEX实验室的强化学习初学者,我建议的学习路线是:

\1. 先学习UCL David Silver的强化学习课程:https://www.davidsilver.uk/teaching/

这是强化学习的基础知识,不太包含深度强化学习的部分,但对后续深入理解深度强化学习十分重要。

\2. 然后学习UC Berkeley的深度强化学习课程:http://rail.eecs.berkeley.edu/deeprlcourse/

\3. 最后可以可以挑着看OpenAI 的夏令营内容:https://sites.google.com/view/deep-rl-bootcamp/lectures

当然,如果希望学习中文的课程,我推荐的是:

\1. 我本人在上海交通大学的强化学习课程: https://www.boyuai.com/rl

\2. 周博磊老师的强化学习课程:https://www.bilibili.com/video/BV1LE411G7Xj

Ref

zhihu Actor-critic和A3C

# 置信域策略优化算法——TRPO

# 强化学习6-DDPG

深度强化学习系列(15): TRPO算法原理及Tensorflow实现-CSDN博客

Pytorch实现强化学习DQN玩迷宫游戏(莫凡强化学习DQN章节pytorch版本)_莫烦迷宫 强化学习 pytorch实现-CSDN博客