记一次刻骨铭心的debug

Posted on 2019-03-08 | Edited on 2019-03-10

上个星期遇到了一个非常棘手的bug。我花了一个星期的时间，到昨天已经绝望，谁知柳暗花明醍醐灌顶，今天终于将其攻克，感觉有50%概率这会是我这辈子中印象最深的一次debug。

这次的bug有以下几个特点：

是第三方库的bug。所以必须要去看第三方库的源代码，搞清楚别人的代码是干什么的，然后自己hack，自己编译。
是一个Higgs-Bugson。只存在于日志中，通过生产环境复现。这一个特性有两个结果，都对debug十分不利：
- 枪毙了基于断点的debug方法，因为事先不知道bug会在什么时候发生。
- 第三方库的开发人员不可能有效参与debug过程。
最最重要的是：debug情节曲折离奇；结局出人意料，又在情理之中。
debug过程使我掌握了一个内存调度算法，收获了一些debug技巧，加深了对程序语言的特性的认识。

Pure Python Might be Faster than You Think

Posted on 2019-03-03 | Edited on 2019-04-17

Recently I was trying to improve the performance of some of my Python code by writing C/C++ extensions. Before I started I thought “well, I should be able to get 10x to 100x performance boost without much effort”. But eventually I could only speed my program up for 2 to 3 times, with much effort. The gain in performance is too little to compensate the pain in development. At first I thought the poor performance is due to some mistakes in my C/C++ extension, but then I realized that it is due to my code is closely related to calling lib functions. What is that supposed to mean? Compare the following C++ and Python code that do set intersection, which is the simplified version of some of my code that I wished to gain much better performance with C/C++ extension.

C++

auto s1 = unordered_set<int>();
auto s2 = unordered_set<int>();
auto s3 = unordered_set<int>();
// s1 and s2 initlized...
for (auto j: s1){
    if(s2.find(j) != s2.end()){
        s3.insert(j);
    }
}

Python

s1 = set()
s2 = set()
# s1 and s2 initlized...
s3 = s1.intersection(s2)

The two codes essentially do the same thing, and their cores are calling lib functions. For C++, it’s find and insert. And for Python, it’s intersection. This post will show why any attempt to speed up this python code is almost certainly in vain. The reasons are two-fold. For code like this:

The time cost of Python is mostly in C level (for CPython) rather than Python level.
The C level code for Python is very well constructed.

警惕NumPy切片视图(Slice View)中的“内存泄漏”陷阱

Posted on 2019-02-28 | Edited on 2019-09-01

NumPy的视图(View)机制可以大大加快对数组进行切片和reshape的速度，同时节省内存。但View机制存在一个极难发现的陷阱，会在许多常见的应用场景下引起内存泄漏。

珍爱生命，使用numpy.seterr

Posted on 2019-01-25 | Edited on 2019-02-15

inf或者NaN是数值计算中最臭名昭著的两个变量了，他们性质诡异，而且往往会带来极其难以排查的bug：

一般而言正常配置的计算过程是不会出现inf或者NaN的，一旦出现往往意味着计算过程在某个地方出了问题。但根据浮点数的定义inf和NaN是有效的浮点数，因此这一bug暂时不会引起任何副作用，直到运行了对inf或者NaN进行检查的函数才会报错，甚至一直不报错显示到前端。

幸运的是，如果程序是以NumPy作为数值运算的基础，可以利用numpy.seterr进行全局配置，使NumPy在遇到这类数值错误（比如说浮点数溢出）时立即报错/警告/调用回调函数。

Mergesort is faster than Quicksort

Posted on 2018-11-20 | Edited on 2019-03-05

Intro

Quicksort is quick, and it’s well known that it can be defeated under certain circumstances, from the simplest already sorted data, to the killer adversary. But if someone claims he has found another algorithm that can outperform Quicksort in random array without strong constraint on the data to be sorted, you’d probably disdain for it and think he must get something wrong. At least, that’s what I would do before my recent discovery - Mergesort is faster than Quicksort, if you use GCC and your CPU is somewhat “new”.

Ridiculous, right? If you don’t believe me, have a look at this Google Colab notebook:

本地调试Travis-CI错误

Posted on 2018-10-31

有时会遇到本地无法重现Travis-CI中所报的错误的情况，这时可以通过本地运行Travis-CI的docker image来进行调试。其步骤在Travis-CI的文档中做了详细的介绍。Travis-CI在Docker Hub上有一系列的环境，可以根据项目按需使用。在进入Docker环境后，按照Travis-CI报错的日志一步步执行即可复现日志中的内容。
复现错误之后的下一步就是debug。如果只是按照官方文档部署docker，一些debug工具会罢工，如GDB会报Error disabling address space randomization: Operation not permitted。这是因为Docker环境默认会禁用许多系统调用（见Seccomp security profiles for Docker），这些禁用一般不会影响正常的应用程序，但是对debug工具来说无疑是釜底抽薪。参考stackoverflow，为docker加入--cap-add=SYS_PTRACE --security-opt seccomp=unconfined的运行参数即可使debug工具恢复正常工作。

CPython源码阅读——Timsort

Posted on 2018-10-27 | Edited on 2019-02-21

对Timsort的简介请见Wiki或者Tim本人的小文章，本文主要分析CPython（版本3.7）的实现部分，不对原理做具体介绍。
Timsort的代码位于Objects/listobject.c中，大概从1000行到2000行，占据整个list实现的三分之一。本文把所有的Timsort源码基本上都粘贴过来了，所以也不会短。
这段代码看起来其实很有意思，因为注释得比较详尽，画风稳中带皮，不失为消遣娱乐之佳品。有人说CPython这部分代码很难懂，我实在不能苟同。

NumPy源码阅读——归并排序

Posted on 2018-10-25 | Edited on 2018-11-20

1	np.sort(a, kind='mergesort')

排序是计算机科学最重要的内容之一，NumPy作为大名鼎鼎的数值运算库对排序也有良好的支持。然而如果你一时好奇想一窥NumPy的排序源码，看看是不是有什么惊天动地的优化，却没那么容易。因为NumPy的排序功能完全由C实现，只靠pip install numpy得到的二进制文件是无法阅读这部分源码的。想知道上面这行排序代码背后是什么故事，只能去GitHub看官方的repo。
本文将从两个角度来解读NumPy（版本：1.15）中归并排序的源码：

算法角度——NumPy是怎样进行归并排序的。
工程角度——这一排序算法是如何成为Python中易用的接口的。

【转载】【翻译】Debugging Python Like a Boss， Python debugging终极指南

Posted on 2018-10-18 | Edited on 2018-12-13

原文：Debugging Python Like a Boss。作者：Brian Cooksey。

def make_pie(self, ingredients):
    print '******WHAT IS GOING ON HERE******'
    print ingredients
    self.oven.preheat()
    print self.oven.temperature

上面这段代码是不是和你屡试不爽的debug方法颇为神似？没错，我原来也经常这么干。实事求是讲，这个方法还不赖，你只需要在代码里加一些print，然后把它跑起来，就能知道发生了什么。当然啦，你常常需要在其它输出到STDOUT里的一团乱麻里滚来滚去，但你想要的就在那里——如果你知道你想要的是什么的话。问题是你经常不知道。如果你知道要检查什么东西，你很可能一开始就不需要这些print了。相反，你会在你感觉有问题的地方附近随便扔几个print，然后迭代这些print的位置使之接近有bug的代码。二分查找万岁！

谢天谢地，我们还有更好的方法。自从C的第一个segfault，一系列被称作debugger的工具就随着每种语言一同出现，Python也不例外。除了自带的debugger，Python社区也开发了很多炫酷的工具，我们将在这个post剩下的内容里对最受欢迎的几个工具做出介绍。

为CMake指定编译器

Posted on 2018-10-14 | Edited on 2019-03-03

为某某程序指定某某环境，让人一下就想到了PATH。然而，CMake并不支持把gcc或者g++放到PATH里来指定编译器。对于简单的特别是自用的项目，可以通过设置CMakeList文件里的变量指定CMake使用的编译器，如：

1 2	set(CMAKE_C_COMPILER "/usr/bin/mygcc") set(CMAKE_CXX_COMPILER "/usr/bin/myg++")

对于复杂的项目，执行CMake时可能还用到了include(GNUInstallDirs)、include(CheckCXXCompilerFlag)等指令，在文件里设定变量的方式就有些勉强了。这时有两种方案：

设定环境变量

1 2	export CC=`which mygcc` export CXX=`which myg++`

使用CMake参数

1	cmake -D CMAKE_C_COMPILE=`which mygcc` CMAKE_CXX_COMPILER=`which myg++` ..

注意这两种情况下的“形参”并不相同。见cmake wiki。