CuriousY A world with wonder

Reading <Up & Going>

| Comment

Up & GoingYou Don’t Know JS系列的第一本书,主要粗略的介绍了ES6的各个语言特性,可以说是从入门到放弃精通的很好读物。这里我摘了一些讲得不错或者我觉得比较有意思的段落,并在部分段落后面添加了当时读到时naive的想法。

The JavaScript engine actually compiles the program on the fly and then immediately runs the compiled code.


implicit coercion(隐式类型转换) is confusing and harms programs with unexpected bugs, and should thus be avoided. It’s even sometimes called a flaw in the design of the language.


There are lots of opinions on what makes well-commented code; we can’t really define absolute universal rules. But some observations and guidelines are quite useful:

  • Code without comments is suboptimal.
  • Too many comments (one per line, for example) is probably a sign of poorly written code.
  • Comments should explain why, not what. They can optionally explain how if that’s particularly confusing.

In some programming languages, you declare a variable (container) to hold a specific type of value, such as numberor stringStatic typing, otherwise known as type enforcement, is typically cited as a benefit for program correctness by preventing unintended value conversions.

Other languages emphasize types for values instead of variables. Weak typing, otherwise known as dynamic typing, allows a variable to hold any type of value at any time. It’s typically cited as a benefit for program flexibility by allowing a single variable to represent a value no matter what type form that value may take at any given moment in the program’s logic flow.

JavaScript uses the latter approach, dynamic typing, meaning variables can hold values of any type without any typeenforcement.

这段看得还是有点晕,其实说的是JavaScript是动态类型语言(虽然这里没提,它也是弱类型,作为对比,Python是强类型语言)。至于动态/静态,强/弱类型语言的区别,觉得轮子哥说的还比较清楚:

强类型:偏向于不容忍隐式类型转换。譬如说haskell的int就不能变成double

弱类型:偏向于容忍隐式类型转换。譬如说C语言的int可以变成double

静态类型:编译的时候就知道每一个变量的类型,因为类型错误而不能做的事情是语法错误。

动态类型:编译的时候不知道每一个变量的类型,因为类型错误而不能做的事情是运行时错误。譬如说你不能对一个数字a写a[10]当数组用。


JavaScript has typed values, not typed variables. The following built-in types are available:

  • string
  • number
  • boolean
  • null and undefined
  • object
  • symbol (new to ES6)

typeof null is an interesting case, because it errantly returns "object", when you’d expect it to return "null". This is a long-standing bug in JS, but one that is likely never going to be fixed. Too much code on the Web relies on the bug and thus fixing it would cause a lot more bugs!


Properties can either be accessed with dot notation (i.e., obj.a) or bracket notation (i.e., obj["a"]). Dot notation is shorter and generally easier to read, and is thus preferred when possible.

Bracket notation is useful if you have a property name that has special characters in it, like obj["hello world!”]. Bracket notation is also useful if you want to access a property/key but the name is stored in another variable


You’ve probably heard sentiments like “coercion is evil” drawn from the fact that there are clearly places where coercion can produce some surprising results. Perhaps nothing evokes frustration from developers more than when the language surprises them.

Coercion is not evil, nor does it have to be surprising. In fact, the majority of cases you can construct with type coercion are quite sensible and understandable, and can even be used to improve the readability of your code.

“coercion is not evil”的情况,对于显式转换没什么问题,但对于隐式转换,我能想到的大概是一些常用的大家都知道的隐式转换,比如console.log()这种(类似的有Python的string.format()和print(),当然Python这里不是coercion,而是调用的__str__方法,但效果和coercion接近),以及条件判断时也不必都转换为bool类型。其他情况我还是很赞同”(implicit) coercion is evil”的啊。

Read more

About npm

| Comment
  1. npm stands for “node package manager”,这应该算是这个工具的初衷吧(for Node.js),当然现在做得越来越好了,官方也把口号改为了”javascript package manager”。

  2. npm 之于Javascript类似于pip之于Python。它们都是“包”管理工具,目前看到的主要区别:
    • 默认的包管理方式不同:npm管理下每个project都有一个独立环境,默认情况下新安装的包也是在项目的目录下的(近似于pip+virtualenv);而pip则是统一安装到了一个目录下(近似于npm -g)。
    • 包的组织结构不同:npm的包结构是一棵树,即每个依赖包都作为这棵树的一个节点,而每个依赖包所依赖的依赖包则是放在各自的节点之下的,这就导致了一个比较底层的包会在这棵树里出现很多次的情况;而pip在这点上则有点像bower(另外一个Javascript包管理工具),它的包结构是一个列表(当然本质上也是棵树),它没有显式地标出依赖包的依赖包,而是在安装时会去安装这棵树上的所有包。
    • npm项目依赖可以写到名为package.json的文件中,内容为json格式(具体格式参考https://docs.npmjs.com/files/package.json);而pip是把依赖写在requirements.txt中或放在setup.py中。
  3. 由于npmpip设计理念(隔离的粒度)的不同,所以我们在开始一个新的JavaScript项目时需要首先创建package.json文件,并npm install来先安装好一些必要的依赖包,然后再着手写项目中的代码;而开始新的Python项目时,是可以不先新建一个requirements.txt文件并pip install的(除非需要的包版本和当前环境的冲突,就要用virtualenv新建一个环境并安装依赖包了),当代码完成之后,我们再把依赖包的信息写在requirements.txtsetup.py当中。当然一个比较好的习惯是每次开始一个新的Python项目都先隔离出一个新的环境并安装好依赖包再开始coding(以此看来,npm的理念还是更好点啊)。

  4. 一些常用的npm命令:

    • 使用sudo npm install npm -g来升级npm到最新版本。

    • 使用npm init来生成一个基础的package.json

    • 使用npm run <command>来执行package.jsonscripts字段中的内容是依赖包中bin目录下的可执行文件(它其实就是对bash的一层封装,添加了一些规则)。 比如在package.json中有如下定义:

        "scripts": {
          "start": "npm run lite",
          "lite": "lite-server"
        },
        "devDependencies": {
          "lite-server": "^2.2.2"
        }
      

      那么我可以进入到项目根目录后执行npm run start以及npm run lite(定义在scripts字段中了,由于这里的start对应的是npm run lite,所以这两条命令效果是一样的),除此之外还可以执行npm run lite-server(其实上述两条命令最终会转换成这条命令),它相当于执行了./node_modules/lite-server/bin/lite-server文件(实现原理是npm会将所有依赖包下的bin目录临时添加到PATH环境变量中,执行lite-server命令会到PATH中所有目录去寻找同名的可执行文件,找到就执行找不到就报错)。

    • 使用npm start可以省略npm run start中的run,同样的简写还有npm testnpm stopnpm restart

Python2的编码问题

| Comment

在Python2中,混合使用Unicodestr类型的字符串经常会碰到类似UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 0: ordinal not in range(128)的错误。比如说如下的操作:

print u'ü'  # It's OK
print 'ü' # It's OK
print 'ü' + u'ü' # Raise UnicodeDecodeError

上述错误的原因在于,Python在执行unicode类型的对象和str类型的对象相加时,首先会将str类型的对象转换为unicode类型的对象,再进行相加操作。而Python在转换时默认使用编码是ascii的,所以在转换时发生了错误而抛出异常。

以下,我们通过显式地指定转换的编码就可以得到正确的结果了。

print 'ü'.decode('utf8')+ u'ü' # It's OK
print 'ü'+ u'ü'.encode('utf8') # It's OK

:上述两行相加操作的得到的对象是一样的吗(这题目可以当面试题了)?

答案是No,前者得到了是一个unicode类型对象,而后者得到的是str类型的对象。验证如下:

print type('ü'.decode('utf8')) # <type 'unicode'>
print type(u'ü'.encode('utf8')) # <type 'str'>
print type('a' + u'a') # <type 'unicode'>

:在引号前面加上u到底发生了什么?u'ü'是否相当于unicode('ü')

答案是u'ü'相当于unicode('ü', 'utf-8'),并不相当于unicode('ü')(unicode的编码默认是ascii的)。验证如下:

repr(unicode('ü', 'utf-8')) == repr(u'ü') # True
repr(unicode('ü')) == repr(u'ü') # Raise UnicodeDecodeError

:上述的’utf-8’是从何而来的?为什么是’utf-8’而不是’utf-16’或’ascii’?

答案是Python会使用sys.stdout.encoding中设置的编码格式来作为其编解码的编码格式,而sys.stdout.encoding会读取bash的环境变量LC_CTYPE,而上述代码运行时的这个环境变量的值为”UTF-8”。验证代码如下:

  • 设定LC_CTYPE为UTF-8并运行下面的代码(结果在旁边comment里):

    import sys
    print sys.stdout.encoding # UTF-8
    print u'ü' # ü
    
  • 设定LC_CTYPE为US-ASCII并运行下面的代码:

    import sys
    print sys.stdout.encoding # US-ASCII
    print u'ü' # raise UnicodeEncodeError
    

:encode和decode有啥区别?为啥有的地方用encode方法,有的地方用decode方法?

答案是这只是个叫法而已,不需要太纠结,从A到B叫encode,那么从B到A自然就叫decode了。在Python里这里的A就是unicode,B就是str。

:那str.encode()和unicode.decode()该如何理解?

这个问题比较tricky,Python在做str.encode时会做一个隐式的类型转换,将str类型转换为unicode类型再做encode(由于这个转换默认是用ascii编码,所以很多情况下会报UnicodeDecodeError)。而unicode.decode()也是一样的(先转换为str再做decode)。因此,在使用时我们大可忽略这两个方法而只去使用unicode.encode和str.decode(explicit is better than implict)。验证代码:

import sys
print sys.getdefaultencoding() # ascii
# The below two expressions are the same
print 'ü'.encode('utf-8') # raise UnicodeDecodeError
print 'ü'.decode(sys.getdefaultencoding()).encode('utf-8') # raise UnicodeDecodeError
# The below two expressions are the same
print u'ü'.decode('utf-8') # raise UnicodeEncodeError
print u'ü'.encode(sys.getdefaultencoding()).decode('utf-8') # raise UnicodeEncodeError

Reference:

  1. http://stackoverflow.com/questions/2596714/why-does-python-print-unicode-characters-when-the-default-encoding-is-ascii
  2. https://docs.python.org/2/howto/unicode.html
  3. http://stackoverflow.com/questions/2081640/what-exactly-do-u-and-r-string-flags-do-in-python-and-what-are-raw-string-l
  4. http://blog.csdn.net/trochiluses/article/details/16825269

Python profiler use experience

| Comment

cProfile###

Python标准库自带的profile工具库的C语言版本,可以用于简单程序的profile,但对多线程的程序使用起来不够友好。

Usage example

Refer to https://docs.python.org/2/library/profile.html#module-cProfile。注意cProfile得到的结果文件没法直接查看,需要通过pstats来展现分析的结果。

yappi###

这个profiler的命名挺有意思(yet another python profiler)。

yappi支持三种输出:pstat,ystat和callgrind。pstat和标准库cProfile生成的文件格式是一样的;ystat是把它自己的结果直接pickle.dump的文件,供它自己再load分析使用(但我折腾了一下怎么load都出错,不知道为啥);callgrind就是给Callgrind使用的。

可以认为yappi是cProfile的加强版,不仅添加了对多线程profile的支持,还有许多不错的功能(但我想说它的文档真的很糟):

  • Ability to analyze per-thread information. (new in 0.94)
  • Ability to hook underlying threading model events/properties. (new in 0.92)
  • Decorator to profile individual functions easily. (new in 0.92)
  • Profiler results can be saved in callgrind and pstat formats. (new in 0.82)
  • Profiler results can be merged from different sessions on-the-fly. (new in 0.82)
  • Profiler results can be easily converted to pstats. (new in 0.82)
  • Supports profiling per-thread CPU time. See http://en.wikipedia.org/wiki/CPU_time for details. (new in 0.62)
  • Profiling of multithreaded Python applications transparently.
  • Profiler can be started from any thread at any time.
  • Ability to get statistics at any time without even stopping the profiler.
  • Various flags to arrange/sort profiler results.

但我实际使用的感觉是,它的profiling per-thread CPU time的功能似乎还有问题?(所有线程的时间总是非常接近)

Result glossary

From官方文档

The first one is function stats:

  • name: name of the function being profiled
  • ncall: is the total callcount of the function.
  • tsub: total spent time in the function minus the total time spent in the other functions called from this function.
  • ttot: total time spent in the function.
  • tavg: is same as ttot/ccnt. Average total time.

the thread stats field gives information about the threads in the profiled application.

  • name: class name of the threading.thread object.
  • tid: thread identifier.
  • fname: name of the last executed function in this thread.
  • ttot: total time spent in this thread.
  • scnt: number of times the thread is scheduled.

Usage example

Use in terminal:

python -m yappi -o profile.pstat -f pstat your_script.py

Use in python code:

import yappi
import threading


def func():
    for i in range(1000000):
        pass


def func2():
    for i in range(2):
        func()


def main():
    threads = []
    for i in range(3):
        t = threading.Thread(target=func)
        threads.append(t)
        t.start()

    for t in threads:
        t.join()


if __name__ == '__main__':
    yappi.start()
    main()
    yappi.get_func_stats().strip_dirs().print_all()
    yappi.get_thread_stats().print_all()
Read more

记坑:paramiko sftp hangs on multi-thread

| Comment

Problem

问题代码如下:

class NodeLogCollector(object):
    LOG_FILES = frozenset(['audit.log',
                           'metrics.log',
                           'mongod.log',
                           'scheduler.log'])

    def __init__(self, node):
        self.node = node
        self._sftp = self.node.ssh_connection._client.open_sftp()
        self._logs = {}  # A dict to record the collected logs from log files.
        for name in self.LOG_FILES:
            self._logs[name] = []
        self._terminate = False
        self._tail_threads = []

    def __del__(self):
        self._sftp.close()

    def tail_log(self, file_name):
        file_path = os.path.join(self.node.home_path, 'var', 'log', file_name)
        with self._sftp.file(file_path, 'r') as f:
            f.seek(0, os.SEEK_END)
            while not self._terminate:
                for line in f:
                    self._logs[file_name].append((time.time(), line,))
                time.sleep(0.1)
                offset = f.tell()
                f.seek(offset)

    def start_tail_all_logs(self):
        for name in self.LOG_FILES:
            t = threading.Thread(target=self.tail_log, args=(name,))
            self._tail_threads.append(t)
            t.start()

    def stop_tail_all_logs(self):
        self._terminate = True
        for t in self._tail_threads:
            t.join()

这是一个用于收集远程机器上的指定log文件内容的类,用于初始化的node包含一个paramiko对象,通过sftp可以直接读取远端的某个文件,这个功能在单线程下是测试成功了的。问题在于使用多线程后,线程总是会hang住,通过打log发现是hang在了stop_tail_all_logs中的t.join()语句,why?

Reason

进一步研究发现,在多线程时,执行时线程会hang在with self._sftp.file(file_path, 'r') as f:这一句上,也就是说线程并没有进入里面的循环,所以也不会受到结束信号来结束了。而当我只开了一个线程时,不会有hang住的问题。

hang在with语句上一般只有一个原因,那就是资源被锁住了,线程一直无法获取资源所以一直在等待。因此可以推断:一个paramiko.sftp对象只能打开一个文件,而不能同时打开多个文件!(即sftp的read方法是线程安全的,多线程下永远只能有一个线程能够在执行这个方法)

Solution

解决方法就是针对每个要读取的文件申请一个paramiko.sftp对象来单独监控即可。

| Page 18 of 25 |