Python，你到底是在底線什麼啦！

June 5, 2017

在Python中
_ 底線 (underscore) 是一個滿特殊的命名方法
主要有有4種形式：

foo_
_foo
__foo__
__foo

其實背後含意不算複雜
網路上相關的解釋也是非常多
不過我花了不少時間才悟出其中的道理
後來我了解到我不理解的地方
不是我看不懂別人的說明
是我沒有使用到它們的需求
所以我無法有比較深刻的體悟…

所以
本篇會除了會解釋個別的含意之外
也會提供實際的例子
讓不懂的人可以快速理解

先講結論

即便不懂以上這些東西，你的人生完全不會受到任何影響
你還是可以繼續快快樂樂地繼續寫Python

了解這些可以幹麻？

膚淺的解釋：

街坊鄰居會稱讚你Python很厲害
自己寫起來也比較爽
就…好棒棒啊

實際的好處：

提昇程式碼的品質
強化程式碼邏輯
避免寫出危險的程式碼

進入正題

1. `foo_`

這種命名方式
主要是避免與 Python 的 built-in keywords 或 built-in functions
取了一樣的名稱

註：你可以透過以下的方式查看Python 的 built-in的 keywords/functions

# list all built-in keywords
import keyword
print(keyword.kwlist)

# list all built-in keywords
print(vars(__builtin__).keys())

與 built-in functions 命名衝突的下場

比如說
我們有一個list是要存放某個一系列的人名

list = [‘Aji', 'Boa', 'Jason']

然而
因為 list是Python built-in 的 keyword
這樣會導致 list 這個function無法再被使用

>>> list(range(10)
TypeError: 'list' object is not callable

這個時候
就可以用 list_ 來命名，做為區隔
（但這種命名其實不夠精確，name_list可能是一個比較好的選擇）

與 built-in keywords 命名衝突的下場

不可能，這件事情根本就不會發生。
真的要這樣做，Python會直接爆給你看

>>> from = 'aji'
SyntaxError: invalid syntax

>>> def is(): pass
SyntaxError: invalid syntax

2. `_foo`

通常使用這種命名法
可能有幾種原因：

你不希望它被直接訪問
它可能只是個測試中的 function
不希望它被直接 import

以下是針對這種命名方式做一些說明：

並不是Private variable

很多人會以為
這就是所謂的 private variable
但其實這種東西並不存在Python裡面
它只是一種命名習慣而已
官方文件有提到這一點

“Private” instance variables that cannot be accessed except from inside an object don’t exist in Python. However, there is a convention that is followed by most Python code: a name prefixed with an underscore (e.g. _spam) should be treated as a non-public part of the API (whether it is a function, a method or a data member).

與其說 private，我倒覺得比較像 hidden 的概念

無法被 import？

並不完全。

如果你有一個叫做 test 的 module：

test.py

def public_func():
    print("I'm available.")

def _private_func():
    print("I'm not available")

然後我們使用 import * ：

>>> from test import *
>>> public_func()
I'm available.

>>> _private_func()
NameError: name '_private_func' is not defined

可以看到， _private_func 是沒有被 import 進來的
不過
你還是有辦法直接 import 它：

>>> from test import _private_func
>>> _private_func()
I'm not available.

或者你也可以定義在 test.py 的 __all__ 裡面：

__all__  = ['public_func', '_private_func']

def public_func():
    print("I'm available.")

def _private_func():
    print("I'm not available")

然後我們在使用 import * 的方式：

>>> from test import *
>>> _private_func()
I'm not available.

不需要 docstrings

如果你有使用 flake8-docstrings 或 pydocstyle 等類似的的工具來檢查你的python code
你會發現這種命名方法不會受到 PEP 257 (docstrings convention) 中的 D103: Missing docstring in public function 規範

我們建立一個叫做 test.py 的檔案
來看看 foo 跟 _foo 的差別：

def foo():
    pass

def _foo():
    pass

用 flake8 來測試一下：

$ flake8 test.py
test.py:4:1: D103 Missing docstring in public function

3. `foo`

基本上
這是留給Python builtin 的 methods 或是 variables
根據 PEP8

__double_leading_and_trailing_underscore__ : “magic” objects or attributes that live in user-controlled namespaces. E.g. __init__ , __import__ or __file__ . Never invent such names; only use them as documented.

Python都叫你不要自己發明了
就不要搞死自己了吧

什麼時候會用到這種命名規則？

StackOverflow上有一個回答不錯：

… So, if you are not a Python core developer or writing a PEP that may be one day become part of the Python standard library or core language definition, try to stay away from using dunder names in your API.

也就是說
如果你是Python核心開發人員
你在來想這件事情吧
如果不是
就盡量避免使用它！

那我還是想用用看，會爆炸嗎？

除非你的 Python 都不升級
而且你確定你的命名沒有跟builtins衝到
那你可以試試看…

我這邊給一個會爆炸的範例：

假設我有一隻程式
是要用來得到FQDN的
簡言之就是 hostname + domain name
這到底是什麼，不是很重要
直接看 code：

class Aji:

    __qualname__ = 'aji-ubuntu'

    def __init__(self, domain):
        self.domain = domain

    def get_fqdn(self):
        """Get fully qulified name."""
        return self.__qualname__ + '.' + self.domain

如果我在python3.2的環境下跑：

>>> Aji(aji.tw).get_fqdn()
'aji-ubuntu.aji.tw'

但如果我在python3.3的環境下跑：

>>> Aji(aji.tw).get_fqdn()
AttributeError: 'Aji' object has no attribute '__qualname__'

爆炸了。

這是因為 __qualname__ 在 python3.3加到了 class , method 以及 function 的 attribute了
於是就爆炸了…

所以大家真的就這麼乖，都有遵守規則？

也是有人壞壞
看看鼎鼎大名的 SQLAlchemy
它也用了一堆像是 __tablename__, __table__, __mapper__ 等等的東西
有爆炸嗎？
我自己是正在爽爽使用中
不過誰知道哪天會不會突然就爆炸了…

4. `__foo`

我個人把它放在最後一個解釋
是因為我覺得它最特別
因為只有這種命名方法
會trigger所謂的name mangling

說明

我這邊舉一個比較沒什麼內涵的例子：

我有一個好朋友，叫做 Jason
他長的像這樣

class Jason:

    location = 'HsinChu'
    favorite_movie = 'Inception'
    hobby = 'card magic'
    __wife = 'Mary'

    def profile(self):
        """Print my personal profile."""
        print(f'''
            I live in {self.location}
            My favorite movie is {self.favorite_movie}
            My hobby is {self.hobby}
            My wife is {self.__wife}
        ''')

我們來看看他的個人檔案會長什麼樣子：

>>> Jason().profile()
I live in HsinChu
My favorite movie is Inception
My hobby is card magic
My wife is Mary

我叫做 Aji
我是他的好朋友
而且跟他有很多共通點
所以我直接繼承他
不過我跟他有兩個地方不一樣
一個是住的地方 (location)不一樣
另一個是老婆 (__wife)也不一樣

class Aji(Jason):

    location = 'Taipei'
    __wife = 'Boa'

那來看看我的個人檔案會長什麼樣子：

>>> Aji().profile()

I live in Taipei
My favorite movie is Inception
My hobby is card magic
My wife is Mary

location 改變了，很好
等等，我的老婆是 Boa不是 Mary啊！
朋友妻不可戲啊！

另外
別人的老婆也不是隨隨便便可以取得的

>>> jason = Jason()
>>> jason.location
HsinChu
>>> jason.__wife
AttributeError: 'Jason' object has no attribute '__wife'

不過透過特殊的方法
還是可以取得 Jason 的老婆的（誤

>>> jason._Jason__wife
'Mary'

好，不要鬧
通常這樣子命名
就是希望不要隨便可以取得那個屬性

讓我們來看看 PEP8 原文是怎麼寫的

If your class is intended to be subclassed, and you have attributes that you do not want subclasses to use, consider naming them with double leading underscores and no trailing underscores. This invokes Python’s name mangling algorithm, where the name of the class is mangled into the attribute name. This helps avoid attribute name collisions should subclasses inadvertently contain attributes with the same name.

以上述的例子來看
這種做法可以避免我們定義在 Jason 下的 __wife
在被subclass 的時候
不會產生命名碰撞 (naming collision)
因為這樣可能會導致被繼承的父類
裡面定義的其他method運作不正常

結語

說實在的
如果你只是自己寫程式自己欣賞
你高興怎麼寫就怎麼寫
你想用 __________foo = bar 也可以
你想這樣做 __foo______ = 'bar 也OK
不過
當工程師的，總是要遵循一些比較嚴謹的規範
而且也不可能一輩子單打獨鬥這樣才有比較專業的感覺
再來就是
更別人co-work的時候
有規範總是比較好溝通的
邏輯清楚
大家開發起來也比較順暢～

Python