Simon Shi的小站

人工智能,机器学习, 强化学习,大模型,自动驾驶

0%

C++ experience

BUG Analysis

undefined reference to

编译链接错误,

  • 新增的.cpp没有加入makefile

  • 没有指定对应的库(.o/.a/.so)

  • 连接库参数的顺序不对 在默认情况下,对于-l 使用库的要求是越是基础的库越要写在后面,无论是静态还动态

  • gcc/ld 版本不匹配 gcc/ld的版本的兼容性问题,由于gcc2 到 gcc3大版本的兼容性存在问题(其实gcc3.2到3.4也一定程度上存在这样的问题) 当在高版本机器上使用低版本的机器就会导致这样的错误, 这个问题比较常见在32位的环境上, 另外就在32位环境不小心使用了64位的库或者反过来64位环境使用了32位的库.

  • C/C++相互依赖和链接 gcc和g++编译结果的混用需要保证能够extern “C” 两边都可以使用的接口,在我们的64位环境中gcc链接g++的库还需要加上 -lstdc++,具体见前文对于混合编译的说明

  • 运行期报错 这个问题基本上是由于程序使用了dlopen方式载入.so, 但.so没有把所有需要的库都链接上,具体参加上文中对于静态库和动态库混合使用的说明

Linux进程分析

https://blog.csdn.net/ktigerhero3/article/details/80004315

https://cloud.tencent.com/developer/article/1701569

手动释放Linux内存https://www.cnblogs.com/jackhub/p/3736877.html

https://blog.csdn.net/wwd0501/article/details/100041808

https://blog.csdn.net/shuihupo/article/details/80905641

contab定时任务

https://www.cnblogs.com/aminxu/p/5993769.html

coredump

SIGNAL

1
man 7 signal

Linux supports the standard signals listed below. Several signal numbers are architecture-dependent, as indicated in the “Value” column. (Where three values are given, the first one is usually valid for alpha and sparc, the
middle one for x86, arm, and most other architectures, and the last one for mips. (Values for parisc are not shown; see the Linux kernel source for signal numbering on that architecture.) A dash (-) denotes that a signal is
absent on the corresponding architecture.

   First the signals described in the original POSIX.1-1990 standard.

   Signal     Value     Action   Comment
   ──────────────────────────────────────────────────────────────────────
   SIGHUP        1       Term    Hangup detected on controlling terminal
                                 or death of controlling process
   SIGINT        2       Term    Interrupt from keyboard
   SIGQUIT       3       Core    Quit from keyboard
   SIGILL        4       Core    Illegal Instruction
   SIGABRT       6       Core    Abort signal from abort(3)
   SIGFPE        8       Core    Floating-point exception
   SIGKILL       9       Term    Kill signal
   SIGSEGV      11       Core    Invalid memory reference
   SIGPIPE      13       Term    Broken pipe: write to pipe with no
                                 readers; see pipe(7)
   SIGALRM      14       Term    Timer signal from alarm(2)
   SIGTERM      15       Term    Termination signal
   SIGUSR1   30,10,16    Term    User-defined signal 1
   SIGUSR2   31,12,17    Term    User-defined signal 2
   SIGCHLD   20,17,18    Ign     Child stopped or terminated
   SIGCONT   19,18,25    Cont    Continue if stopped
   SIGSTOP   17,19,23    Stop    Stop process
   SIGTSTP   18,20,24    Stop    Stop typed at terminal
   SIGTTIN   21,21,26    Stop    Terminal input for background process
   SIGTTOU   22,22,27    Stop    Terminal output for background process

   The signals SIGKILL and SIGSTOP cannot be caught, blocked, or ignored.

ref: https://blog.csdn.net/wanxuexiang/article/details/88382733

dmesg

1
2
dmesg
dmesg -T

gdb调试

1
2
3
4
r
bt
l // list code
watch
1
2
3
4
5
6
7
bt
where

f 1
disassemble

shell echo free@plt |c++filt

LOGS:

问题:torch cudnn Destory ini.c:138 Backtrace stopped: frame did not

1
2
3
4
5
6
7
8
9
10
11
12
(gdb) bt
#0 0x00007f8891fed9fe in ?? () from /usr/local/cuda-10.0/lib64/libcudart.so.10.0
#1 0x00007f8891ff296b in ?? () from /usr/local/cuda-10.0/lib64/libcudart.so.10.0
#2 0x00007f889201f8e0 in cudaStreamDestroy () from /usr/local/cuda-10.0/lib64/libcudart.so.10.0
#3 0x00007f88a24f563d in cudnnDestroy () from /data/include/libtorch/lib/libtorch.so
#4 0x00007f8898d8fa15 in at::cuda::(anonymous namespace)::DeviceThreadHandlePool<cudnnContext*, &at::native::(anonymous namespace)::createCuDNNHandle, &at::native::(anonymous namespace)::destroyCuDNNHandle>::~Devi
from /data/include/libtorch/lib/libtorch.so
#5 0x00007f8891681735 in __cxa_finalize (d=0x7f88d7684000) at cxa_finalize.c:83
#6 0x00007f8893de4d43 in __do_global_dtors_aux () from /data/include/libtorch/lib/libtorch.so
#7 0x00007ffed7d4bbb0 in ?? ()
#8 0x00007f88dc13bd13 in _dl_fini () at dl-fini.c:138
Backtrace stopped: frame did not save the PC

解决:

  • SO库ld加载torch.so

  • 主程序不需要再次ld了,不然就会上面报错

问题C++】symbol lookup error :undefined reference to找不到

解决:

  • 1、import *.h文件 没有声明定义

  • 2、声明定义的函数参数定义与cpp实现不一致,比如参数多了const的修饰(编译可以通过)

Debug

core dump 生成

1 调试

1
2
3
4
gdb exe_file core-file

(gdb) l
46 in ../sysdeps/unix/sysv/linux/raise.c

2、bt 查看

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
(gdb) bt
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1 0x00007f056317e801 in __GI_abort () at abort.c:79
#2 0x00007f05631c7897 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7f05632f4b9a "%s\n") at ../sysdeps/posix/libc_fatal.c:181
#3 0x00007f05631ce90a in malloc_printerr (str=str@entry=0x7f05632f2d88 "free(): invalid pointer") at malloc.c:5350
#4 0x00007f05631d5e1c in _int_free (have_lock=0, p=0x7f0400000013, av=0x7f0563529c40 <main_arena>) at malloc.c:4157
#5 __GI___libc_free (mem=0x7f0400000023) at malloc.c:3124
#6 0x00007f0562c055a0 in __gnu_cxx::new_allocator<int>::deallocate (this=0x7f0454019258, __p=0x7f0400000023) at /usr/include/c++/7/ext/new_allocator.h:125
#7 0x00007f0562c0531a in std::allocator_traits<std::allocator<int> >::deallocate (__a=..., __p=0x7f0400000023, __n=18446709159920402432) at /usr/include/c++/7/bits/alloc_traits.h:462
#8 0x00007f0562c04f08 in std::_Vector_base<int, std::allocator<int> >::_M_deallocate (this=0x7f0454019258, __p=0x7f0400000023, __n=18446709159920402432)
at /usr/include/c++/7/bits/stl_vector.h:180
#9 0x00007f0562c06cd5 in std::_Vector_base<int, std::allocator<int> >::~_Vector_base (this=0x7f0454019258, __in_chrg=<optimized out>) at /usr/include/c++/7/bits/stl_vector.h:162
#10 0x00007f0562c06b15 in std::vector<int, std::allocator<int> >::~vector (this=0x7f0454019258, __in_chrg=<optimized out>) at /usr/include/c++/7/bits/stl_vector.h:435
#11 0x00007f0562c4cf02 in __gnu_cxx::new_allocator<std::vector<int, std::allocator<int> > >::destroy<std::vector<int, std::allocator<int> > > (this=0x7f0493fd45b0, __p=0x7f0454019258)
at /usr/include/c++/7/ext/new_allocator.h:140
#12 0x00007f0562c4b29e in std::allocator_traits<std::allocator<std::vector<int, std::allocator<int> > > >::destroy<std::vector<int, std::allocator<int> > > (__a=..., __p=0x7f0454019258)
at /usr/include/c++/7/bits/alloc_traits.h:487
#13 0x00007f0562c497bb in std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > >::pop_back (this=0x7f0493fd45b0)
at /usr/include/c++/7/bits/stl_vector.h:979
#14 0x00007f0562c53d82 in ddzmove_sg::MoveGener::gen_type_1_single (this=0x7f0493fd4710, cards=0x7f0493fd4b00, result=std::vector of length -1, capacity 1 = {...}, is_start=true)
at /data/shixx/games/DDZ-QS-Server/ai_server_sanguo_call/src/move_utils_sg.cpp:532
#15 0x00007f0562c537cf in ddzmove_sg::MoveGener::gen_moves (this=0x7f0493fd4710, cards=0x7f0493fd4b00, outed_3dai_num=2, horse=std::vector of length 1, capacity 1 = {...},
result=std::vector of length -1, capacity 1 = {...}) at /data/shixx/games/DDZ-QS-Server/ai_server_sanguo_call/src/move_utils_sg.cpp:465
#16 0x00007f0562c5879e in ddzmove_sg::get_legal_card_play_actions[abi:cxx11](ddzmove_sg::MoveGener, int*, int*, int, int, std::vector<int, std::allocator<int> >, std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > >&, bool, int) (mg=..., rival_move_cards=0x7f0493fd4b50, next_hands=0x7f0493fd4b00, hero_id=106,
outed_3dai_num=2, horse=std::vector of length 1, capacity 1 = {...}, avail_moves=std::vector of length -1, capacity 1 = {...}, ignorePass=false, round_id=973329)
at /data/shixx/games/DDZ-QS-Server/ai_server_sanguo_call/src/move_utils_sg.cpp:1217
#17 0x00007f0562c5f613 in TorchServer::getPTOut (this=0x2f25ec0, result=std::vector of length 0, capacity 0, heroes=0x7f045401ff50, skill=0x7f0454006590, horse=0x7f0454019290,
all_cards=0x7f0493fd68c0, bottom=0x7f0493fd6870, mingpai=0x7f0454002b20, remain_num=0x7f04540016f0, out_history=0x7f0493fd69a0, cur_turn=21, my_seat=1, dz_seat=1, aitype=0, ailevel=5,
round_id=973329, topk=3, ignorePass=false) at /data/shixx/games/DDZ-QS-Server/ai_server_sanguo_call/src/torch_server.cpp:323
#18 0x00007f0562cbbfdc in getPTOut (action=0x7f0493fda410, action_num=0x7f0493fda2e0, cards=0x7f047c3332a0, remain_num=0x7f04540016f0, bottom=0x7f045400f460, bottom_num=3,
mingpai=0x7f0454002b20, out_history=0x7f0454033990, heroes=0x7f045401ff50, skill=0x7f0454006590, horse=0x7f0454019290, call_history=0x7f045400d1c0, cur_turn=21, my_seat=1, dz_seat=1,
first_bid_seat=-1, aitype=0, ailevel=5, round_id=973329) at /data/shixx/games/DDZ-QS-Server/ai_server_sanguo_call/src/torch_so.cpp:114

3 查看出错的栈信息

frame + (gdb) info args

1
2
3
4
5
6
7
8
9
10
11
(gdb) frame
(gdb) f 14
#14 0x00007f0562c53d82 in ddzmove_sg::MoveGener::gen_type_1_single (this=0x7f0493fd4710, cards=0x7f0493fd4b00, result=std::vector of length -1, capacity 1 = {...}, is_start=true)
at /data/shixx/games/DDZ-QS-Server/ai_server_sanguo_call/src/move_utils_sg.cpp:532
532 /data/shixx/games/DDZ-QS-Server/ai_server_sanguo_call/src/move_utils_sg.cpp: No such file or directory.
(gdb) info args
this = 0x7f0493fd4710
cards = 0x7f0493fd4b00
result = std::vector of length -1, capacity 1 = {std::vector of length 1, capacity 1 = {1409472400}, std::vector of length -6, capacity -1073741833 = {
<error reading variable result (Cannot access memory at address 0x25)>
is_start = true

4 查看参数数据

已知cards是一个数组(17位)

1
2
3
4
5
6
7
8
9
(gdb) print cards
$1 = (int *) 0x7f0493fd4b00
(gdb) print cards[0]
$2 = 0
(gdb) print cards[1]
$3 = 0
...
(gdb) print cards[17]
$18 = 2