视频1 视频21 视频41 视频61 视频文章1 视频文章21 视频文章41 视频文章61 推荐1 推荐3 推荐5 推荐7 推荐9 推荐11 推荐13 推荐15 推荐17 推荐19 推荐21 推荐23 推荐25 推荐27 推荐29 推荐31 推荐33 推荐35 推荐37 推荐39 推荐41 推荐43 推荐45 推荐47 推荐49 关键词1 关键词101 关键词201 关键词301 关键词401 关键词501 关键词601 关键词701 关键词801 关键词901 关键词1001 关键词1101 关键词1201 关键词1301 关键词1401 关键词1501 关键词1601 关键词1701 关键词1801 关键词1901 视频扩展1 视频扩展6 视频扩展11 视频扩展16 文章1 文章201 文章401 文章601 文章801 文章1001 资讯1 资讯501 资讯1001 资讯1501 标签1 标签501 标签1001 关键词1 关键词501 关键词1001 关键词1501 专题2001
Python标准库之functools/itertools/operator
2020-11-27 14:26:50 责编:小采
文档


引言

functools, itertools, operator是Python标准库为我们提供的支持函数式编程的三大模块,合理的使用这三个模块,我们可以写出更加简洁可读的Pythonic代码,接下来我们通过一些example来了解三大模块的使用。

functools的使用

functools是Python中很重要的模块,它提供了一些非常有用的高阶函数。高阶函数就是说一个可以接受函数作为参数或者以函数作为返回值的函数,因为Python中函数也是对象,因此很容易支持这样的函数式特性。

partial

>>> from functools import partial

>>> basetwo = partial(int, base=2)

>>> basetwo('10010')
18

basetwo('10010')实际上等价于调用int('10010', base=2),当函数的参数个数太多的时候,可以通过使用functools.partial来创建一个新的函数来简化逻辑从而增强代码的可读性,而partial内部实际上就是通过一个简单的闭包来实现的。

def partial(func, *args, **keywords):
 def newfunc(*fargs, **fkeywords):
 newkeywords = keywords.copy()
 newkeywords.update(fkeywords)
 return func(*args, *fargs, **newkeywords)
 newfunc.func = func
 newfunc.args = args
 newfunc.keywords = keywords
 return newfunc

partialmethod

partialmethod和partial类似,但是对于绑定一个非对象自身的方法的时候,这个时候就只能使用partialmethod了,我们通过下面这个例子来看一下两者的差异。

from functools import partial, partialmethod


def standalone(self, a=1, b=2):
 "Standalone function"
 print(' called standalone with:', (self, a, b))
 if self is not None:
 print(' self.attr =', self.attr)


class MyClass:
 "Demonstration class for functools"
 def __init__(self):
 self.attr = 'instance attribute'
 method1 = functools.partialmethod(standalone) # 使用partialmethod
 method2 = functools.partial(standalone) # 使用partial
>>> o = MyClass()

>>> o.method1()
 called standalone with: (<__main__.MyClass object at 0x7f46d40cc550>, 1, 2)
 self.attr = instance attribute

# 不能使用partial
>>> o.method2()
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
TypeError: standalone() missing 1 required positional argument: 'self'

singledispatch

虽然Python不支持同名方法允许有不同的参数类型,但是我们可以借用singledispatch来动态指定相应的方法所接收的参数类型,而不用把参数判断放到方法内部去判断从而降低代码的可读性。

from functools import singledispatch


class TestClass(object):
 @singledispatch
 def test_method(arg, verbose=False):
 if verbose:
 print("Let me just say,", end=" ")
 print(arg)

 @test_method.register(int)
 def _(arg):
 print("Strength in numbers, eh?", end=" ")
 print(arg)

 @test_method.register(list)
 def _(arg):
 print("Enumerate this:")

 for i, elem in enumerate(arg):
 print(i, elem)

下面通过@test_method.register(int)和@test_method.register(list)指定当test_method的第一个参数为int或者list的时候,分别调用不同的方法来进行处理。

>>> TestClass.test_method(55555) # call @test_method.register(int)
Strength in numbers, eh? 55555

>>> TestClass.test_method([33, 22, 11]) # call @test_method.register(list)
Enumerate this:
0 33
1 22
2 11

>>> TestClass.test_method('hello world', verbose=True) # call default
Let me just say, hello world

wraps

装饰器会遗失被装饰函数的__name__和__doc__等属性,可以使用@wraps来恢复。

from functools import wraps


def my_decorator(f):
 @wraps(f)
 def wrapper():
 """wrapper_doc"""
 print('Calling decorated function')
 return f()
 return wrapper


@my_decorator
def example():
 """example_doc"""
 print('Called example function')
>>> example.__name__
'example'
>>> example.__doc__
'example_doc'

# 尝试去掉@wraps(f)来看一下运行结果,example自身的__name__和__doc__都已经丧失了
>>> example.__name__
'wrapper'
>>> example.__doc__
'wrapper_doc'

我们也可以使用update_wrapper来改写

from itertools import update_wrapper


def g():
 ...
g = update_wrapper(g, f)


# equal to
@wraps(f)
def g():
 ...

@wraps内部实际上就是基于update_wrapper来实现的。

def wraps(wrapped, assigned=WRAPPER_ASSIGNMENTS, updated=WRAPPER_UPDATES):
 def decorator(wrapper):
 return update_wrapper(wrapper, wrapped=wrapped...)
 return decorator

lru_cache

lru_cache和singledispatch是开发中应用非常广泛的黑魔法,接下来我们来看一下lru_cache。对于重复的计算性任务,使用缓存加速是非常重要的,下面我们通过一个fibonacci的例子来看一下使用lru_cache与不使用lru_cache在速度上的差异。

# clockdeco.py

import time
import functools


def clock(func):
 @functools.wraps(func)
 def clocked(*args, **kwargs):
 t0 = time.time()
 result = func(*args, **kwargs)
 elapsed = time.time() - t0
 name = func.__name__
 arg_lst = []
 if args:
 arg_lst.append(', '.join(repr(arg) for arg in args))
 if kwargs:
 pairs = ['%s=%r' % (k, w) for k, w in sorted(kwargs.items())]
 arg_lst.append(', '.join(pairs))
 arg_str = ', '.join(arg_lst)
 print('[%0.8fs] %s(%s) -> %r ' % (elapsed, name, arg_str, result))
 return result
 return clocked

不使用lru_cache

from clockdeco import clock


@clock
def fibonacci(n):
 if n < 2:
 return n
 return fibonacci(n-2) + fibonacci(n-1)


if __name__=='__main__':
 print(fibonacci(6))

下面是运行结果,从运行结果可以看出fibonacci(n)会在递归的时候被重复计算,这是非常耗时消费资源的。

[0.00000119s] fibonacci(0) -> 0 
[0.00000143s] fibonacci(1) -> 1 
[0.00021172s] fibonacci(2) -> 1 
[0.00000072s] fibonacci(1) -> 1 
[0.00000095s] fibonacci(0) -> 0 
[0.00000095s] fibonacci(1) -> 1 
[0.00011444s] fibonacci(2) -> 1 
[0.00022793s] fibonacci(3) -> 2 
[0.00055265s] fibonacci(4) -> 3 
[0.00000072s] fibonacci(1) -> 1 
[0.00000072s] fibonacci(0) -> 0 
[0.00000095s] fibonacci(1) -> 1 
[0.00011158s] fibonacci(2) -> 1 
[0.00022268s] fibonacci(3) -> 2 
[0.00000095s] fibonacci(0) -> 0 
[0.00000095s] fibonacci(1) -> 1 
[0.00011349s] fibonacci(2) -> 1 
[0.00000072s] fibonacci(1) -> 1 
[0.00000095s] fibonacci(0) -> 0 
[0.00000095s] fibonacci(1) -> 1 
[0.00010705s] fibonacci(2) -> 1 
[0.00021267s] fibonacci(3) -> 2 
[0.00043225s] fibonacci(4) -> 3 
[0.00076509s] fibonacci(5) -> 5 
[0.00142813s] fibonacci(6) -> 8 
8

使用lru_cache

import functools
from clockdeco import clock


@functools.lru_cache() # 1
@clock # 2
def fibonacci(n):
 if n < 2:
 return n
 return fibonacci(n-2) + fibonacci(n-1)

if __name__=='__main__':
 print(fibonacci(6))

下面是运行结果,对于已经计算出来的结果将其放入缓存。

[0.00000095s] fibonacci(0) -> 0 
[0.00005770s] fibonacci(1) -> 1 
[0.00015855s] fibonacci(2) -> 1 
[0.00000286s] fibonacci(3) -> 2 
[0.00021124s] fibonacci(4) -> 3 
[0.00000191s] fibonacci(5) -> 5 
[0.00024652s] fibonacci(6) -> 8 
8

上面我们选用的数字还不够大,感兴趣的朋友不妨自己选择一个较大的数字比较一下两者在速度上的差异

total_ordering

Python2中可以通过自定义__cmp__的返回值0/-1/1来比较对象的大小,在Python3中废弃了__cmp__,但是我们可以通过total_ordering然后修改 __lt__() , __le__() , __gt__(), __ge__(), __eq__(), __ne__() 等魔术方法来自定义类的比较规则。p.s: 如果使用必须在类里面定义 __lt__() , __le__() , __gt__(), __ge__()中的一个,以及给类添加一个__eq__() 方法。

import functools


@functools.total_ordering
class MyObject:
 def __init__(self, val):
 self.val = val

 def __eq__(self, other):
 print(' testing __eq__({}, {})'.format(
 self.val, other.val))
 return self.val == other.val

 def __gt__(self, other):
 print(' testing __gt__({}, {})'.format(
 self.val, other.val))
 return self.val > other.val


a = MyObject(1)
b = MyObject(2)

for expr in ['a < b', 'a <= b', 'a == b', 'a >= b', 'a > b']:
 print('
{:<6}:'.format(expr))
 result = eval(expr)
 print(' result of {}: {}'.format(expr, result))

下面是运行结果:

a < b :
 testing __gt__(1, 2)
 testing __eq__(1, 2)
 result of a < b: True

a <= b:
 testing __gt__(1, 2)
 result of a <= b: True

a == b:
 testing __eq__(1, 2)
 result of a == b: False

a >= b:
 testing __gt__(1, 2)
 testing __eq__(1, 2)
 result of a >= b: False

a > b :
 testing __gt__(1, 2)
 result of a > b: False

itertools的使用

itertools为我们提供了非常有用的用于操作迭代对象的函数。

无限迭代器

count

count(start=0, step=1) 会返回一个无限的整数iterator,每次增加1。可以选择提供起始编号,默认为0。

>>> from itertools import count

>>> for i in zip(count(1), ['a', 'b', 'c']):
... print(i, end=' ')
...
(1, 'a') (2, 'b') (3, 'c')

cycle

cycle(iterable) 会把传入的一个序列无限重复下去,不过可以提供第二个参数就可以制定重复次数。

>>> from itertools import cycle

>>> for i in zip(range(6), cycle(['a', 'b', 'c'])):
... print(i, end=' ')
...
(0, 'a') (1, 'b') (2, 'c') (3, 'a') (4, 'b') (5, 'c')

repeat

repeat(object[, times]) 返回一个元素无限重复下去的iterator,可以提供第二个参数就可以限定重复次数。

>>> from itertools import repeat

>>> for i, s in zip(count(1), repeat('over-and-over', 5)):
... print(i, s)
...
1 over-and-over
2 over-and-over
3 over-and-over
4 over-and-over
5 over-and-over

Iterators terminating on the shortest input sequence

accumulate

accumulate(iterable[, func])

>>> from itertools import accumulate
>>> import operator

>>> list(accumulate([1, 2, 3, 4, 5], operator.add))
[1, 3, 6, 10, 15]

>>> list(accumulate([1, 2, 3, 4, 5], operator.mul))
[1, 2, 6, 24, 120]

chain

itertools.chain(*iterables)可以将多个iterable组合成一个iterator

>>> from itertools import chain

>>> list(chain([1, 2, 3], ['a', 'b', 'c']))
[1, 2, 3, 'a', 'b', 'c']

chain的实现原理如下

def chain(*iterables):
 # chain('ABC', 'DEF') --> A B C D E F
 for it in iterables:
 for element in it:
 yield element

chain.from_iterable

chain.from_iterable(iterable)和chain类似,但是只是接收单个iterable,然后将这个iterable中的元素组合成一个iterator。

>>> from itertools import chain

>>> list(chain.from_iterable(['ABC', 'DEF']))
['A', 'B', 'C', 'D', 'E', 'F']

实现原理也和chain类似

def from_iterable(iterables):
 # chain.from_iterable(['ABC', 'DEF']) --> A B C D E F
 for it in iterables:
 for element in it:
 yield element

compress

compress(data, selectors)接收两个iterable作为参数,只返回selectors中对应的元素为True的data,当data/selectors之一用尽时停止。

>>> list(compress([1, 2, 3, 4, 5], [True, True, False, False, True]))
[1, 2, 5]

zip_longest

zip_longest(*iterables, fillvalue=None)和zip类似,但是zip的缺陷是iterable中的某一个元素被遍历完,整个遍历都会停止,具体差异请看下面这个例子

from itertools import zip_longest

r1 = range(3)
r2 = range(2)

print('zip stops early:')
print(list(zip(r1, r2)))

r1 = range(3)
r2 = range(2)

print('
zip_longest processes all of the values:')
print(list(zip_longest(r1, r2)))

下面是输出结果

zip stops early:
[(0, 0), (1, 1)]

zip_longest processes all of the values:
[(0, 0), (1, 1), (2, None)]

islice

islice(iterable, stop) or islice(iterable, start, stop[, step]) 与Python的字符串和列表切片有一些类似,只是不能对start、start和step使用负值。

>>> from itertools import islice

>>> for i in islice(range(100), 0, 100, 10):
... print(i, end=' ')
...
0 10 20 30 40 50 60 70 80 90

tee

tee(iterable, n=2) 返回n个的iterator,n默认为2。

from itertools import islice, tee

r = islice(count(), 5)
i1, i2 = tee(r)

print('i1:', list(i1))
print('i2:', list(i2))

for i in r:
 print(i, end=' ')
 if i > 1:
 break

下面是输出结果,注意tee(r)后,r作为iterator已经失效,所以for循环没有输出值。

i1: [0, 1, 2, 3, 4]
i2: [0, 1, 2, 3, 4]

starmap

starmap(func, iterable)假设iterable将返回一个元组流,并使用这些元组作为参数调用func:

>>> from itertools import starmap
>>> import os

>>> iterator = starmap(os.path.join,
... [('/bin', 'python'), ('/usr', 'bin', 'java'),
... ('/usr', 'bin', 'perl'), ('/usr', 'bin', 'ruby')])

>>> list(iterator)
['/bin/python', '/usr/bin/java', '/usr/bin/perl', '/usr/bin/ruby']

filterfalse

filterfalse(predicate, iterable) 与filter()相反,返回所有predicate返回False的元素。

itertools.filterfalse(is_even, itertools.count()) =>
1, 3, 5, 7, 9, 11, 13, 15, ...

takewhile

takewhile(predicate, iterable) 只要predicate返回True,不停地返回iterable中的元素。一旦predicate返回False,iteration将结束。

def less_than_10(x):
 return x < 10

itertools.takewhile(less_than_10, itertools.count())
=> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9

itertools.takewhile(is_even, itertools.count())
=> 0

dropwhile

dropwhile(predicate, iterable) 在predicate返回True时舍弃元素,然后返回其余迭代结果。

itertools.dropwhile(less_than_10, itertools.count())
=> 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ...

itertools.dropwhile(is_even, itertools.count())
=> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ...

groupby

groupby(iterable, key=None) 把iterator中相邻的重复元素挑出来放在一起。p.s: The input sequence needs to be sorted on the key value in order for the groupings to work out as expected.

  • [k for k, g in groupby('AAAABBBCCDAABBB')] --> A B C D A B

  • [list(g) for k, g in groupby('AAAABBBCCD')] --> AAAA BBB CC D

  • >>> import itertools
    
    >>> for key, group in itertools.groupby('AAAABBBCCDAABBB'):
    ... print(key, list(group))
    ...
    A ['A', 'A', 'A', 'A']
    B ['B', 'B', 'B']
    C ['C', 'C']
    D ['D']
    A ['A', 'A']
    B ['B', 'B', 'B']
    city_list = [('Decatur', 'AL'), ('Huntsville', 'AL'), ('Selma', 'AL'),
     ('Anchorage', 'AK'), ('Nome', 'AK'),
     ('Flagstaff', 'AZ'), ('Phoenix', 'AZ'), ('Tucson', 'AZ'),
     ...
     ]
    
    def get_state(city_state):
     return city_state[1]
    
    itertools.groupby(city_list, get_state) =>
     ('AL', iterator-1),
     ('AK', iterator-2),
     ('AZ', iterator-3), ...
    
    iterator-1 => ('Decatur', 'AL'), ('Huntsville', 'AL'), ('Selma', 'AL')
    iterator-2 => ('Anchorage', 'AK'), ('Nome', 'AK')
    iterator-3 => ('Flagstaff', 'AZ'), ('Phoenix', 'AZ'), ('Tucson', 'AZ')

    Combinatoric generators

    product

    product(*iterables, repeat=1)

  • product(A, B) returns the same as ((x,y) for x in A for y in B)

  • product(A, repeat=4) means the same as product(A, A, A, A)

  • from itertools import product
    
    
    def show(iterable):
     for i, item in enumerate(iterable, 1):
     print(item, end=' ')
     if (i % 3) == 0:
     print()
     print()
    
    
    print('Repeat 2:
    ')
    show(product(range(3), repeat=2))
    
    print('Repeat 3:
    ')
    show(product(range(3), repeat=3))
    Repeat 2:
    
    (0, 0) (0, 1) (0, 2)
    (1, 0) (1, 1) (1, 2)
    (2, 0) (2, 1) (2, 2)
    
    Repeat 3:
    
    (0, 0, 0) (0, 0, 1) (0, 0, 2)
    (0, 1, 0) (0, 1, 1) (0, 1, 2)
    (0, 2, 0) (0, 2, 1) (0, 2, 2)
    (1, 0, 0) (1, 0, 1) (1, 0, 2)
    (1, 1, 0) (1, 1, 1) (1, 1, 2)
    (1, 2, 0) (1, 2, 1) (1, 2, 2)
    (2, 0, 0) (2, 0, 1) (2, 0, 2)
    (2, 1, 0) (2, 1, 1) (2, 1, 2)
    (2, 2, 0) (2, 2, 1) (2, 2, 2)

    permutations

    permutations(iterable, r=None)返回长度为r的所有可能的组合。

    from itertools import permutations
    
    
    def show(iterable):
     first = None
     for i, item in enumerate(iterable, 1):
     if first != item[0]:
     if first is not None:
     print()
     first = item[0]
     print(''.join(item), end=' ')
     print()
    
    
    print('All permutations:
    ')
    show(permutations('abcd'))
    
    print('
    Pairs:
    ')
    show(permutations('abcd', r=2))

    下面是输出结果

    All permutations:
    
    abcd abdc acbd acdb adbc adcb
    bacd badc bcad bcda bdac bdca
    cabd cadb cbad cbda cdab cdba
    dabc dacb dbac dbca dcab dcba
    
    Pairs:
    
    ab ac ad
    ba bc bd
    ca cb cd
    da db dc

    combinations

    combinations(iterable, r) 返回一个iterator,提供iterable中所有元素可能组合的r元组。每个元组中的元素保持与iterable返回的顺序相同。下面的实例中,不同于上面的permutations,a总是在bcd之前,b总是在cd之前,c总是在d之前。

    from itertools import combinations
    
    
    def show(iterable):
     first = None
     for i, item in enumerate(iterable, 1):
     if first != item[0]:
     if first is not None:
     print()
     first = item[0]
     print(''.join(item), end=' ')
     print()
    
    
    print('Unique pairs:
    ')
    show(combinations('abcd', r=2))

    下面是输出结果

    Unique pairs:
    
    ab ac ad
    bc bd
    cd

    combinations_with_replacement

    combinations_with_replacement(iterable, r)函数放宽了一个不同的约束:元素可以在单个元组中重复,即可以出现aa/bb/cc/dd等组合。

    from itertools import combinations_with_replacement
    
    
    def show(iterable):
     first = None
     for i, item in enumerate(iterable, 1):
     if first != item[0]:
     if first is not None:
     print()
     first = item[0]
     print(''.join(item), end=' ')
     print()
    
    
    print('Unique pairs:
    ')
    show(combinations_with_replacement('abcd', r=2))

    下面是输出结果

    aa ab ac ad
    bb bc bd
    cc cd
    dd

    operator的使用

    attrgetter

    operator.attrgetter(attr)和operator.attrgetter(*attrs)

  • After f = attrgetter('name'), the call f(b) returns b.name.

  • After f = attrgetter('name', 'date'), the call f(b) returns (b.name, b.date).

  • After f = attrgetter('name.first', 'name.last'), the call f(b) returns (b.name.first, b.name.last).

  • 我们通过下面这个例子来了解一下itergetter的用法。

    >>> class Student:
    ... def __init__(self, name, grade, age):
    ... self.name = name
    ... self.grade = grade
    ... self.age = age
    ... def __repr__(self):
    ... return repr((self.name, self.grade, self.age))
    
    >>> student_objects = [
    ... Student('john', 'A', 15),
    ... Student('jane', 'B', 12),
    ... Student('dave', 'B', 10),
    ... ]
    
    >>> sorted(student_objects, key=lambda student: student.age) # 传统的lambda做法
    [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
    
    >>> from operator import itemgetter, attrgetter
    
    >>> sorted(student_objects, key=attrgetter('age'))
    [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
    
    # 但是如果像下面这样接受双重比较,Python脆弱的lambda就不适用了
    >>> sorted(student_objects, key=attrgetter('grade', 'age'))
    [('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)]

    attrgetter的实现原理:

    def attrgetter(*items):
     if any(not isinstance(item, str) for item in items):
     raise TypeError('attribute name must be a string')
     if len(items) == 1:
     attr = items[0]
     def g(obj):
     return resolve_attr(obj, attr)
     else:
     def g(obj):
     return tuple(resolve_attr(obj, attr) for attr in items)
     return g
    
    def resolve_attr(obj, attr):
     for name in attr.split("."):
     obj = getattr(obj, name)
     return obj

    itemgetter

    operator.itemgetter(item)和operator.itemgetter(*items)

  • After f = itemgetter(2), the call f(r) returns r[2].

  • After g = itemgetter(2, 5, 3), the call g(r) returns (r[2], r[5], r[3]).

  • 我们通过下面这个例子来了解一下itergetter的用法

    >>> student_tuples = [
    ... ('john', 'A', 15),
    ... ('jane', 'B', 12),
    ... ('dave', 'B', 10),
    ... ]
    
    >>> sorted(student_tuples, key=lambda student: student[2]) # 传统的lambda做法
    [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
    
    >>> from operator import attrgetter
    
    >>> sorted(student_tuples, key=itemgetter(2))
    [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
    
    # 但是如果像下面这样接受双重比较,Python脆弱的lambda就不适用了
    >>> sorted(student_tuples, key=itemgetter(1,2))
    [('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)]

    itemgetter的实现原理

    def itemgetter(*items):
     if len(items) == 1:
     item = items[0]
     def g(obj):
     return obj[item]
     else:
     def g(obj):
     return tuple(obj[item] for item in items)
     return g

    methodcaller

    operator.methodcaller(name[, args...])

  • After f = methodcaller('name'), the call f(b) returns b.name().

  • After f = methodcaller('name', 'foo', bar=1), the call f(b) returns b.name('foo', bar=1).

  • methodcaller的实现原理

    def methodcaller(name, *args, **kwargs):
     def caller(obj):
     return getattr(obj, name)(*args, **kwargs)
     return caller

    References

    DOCUMENTATION-FUNCTOOLS
    DOCUMENTATION-ITERTOOLS
    DOCUMENTATION-OPERATOR
    HWOTO-FUNCTIONAL
    HWOTO-SORTING
    PYMOTW
    FLENT-PYTHON

    本文为作者原创,转载请先与作者联系。首发于我的博客

    引言

    functools, itertools, operator是Python标准库为我们提供的支持函数式编程的三大模块,合理的使用这三个模块,我们可以写出更加简洁可读的Pythonic代码,接下来我们通过一些example来了解三大模块的使用。

    functools的使用

    functools是Python中很重要的模块,它提供了一些非常有用的高阶函数。高阶函数就是说一个可以接受函数作为参数或者以函数作为返回值的函数,因为Python中函数也是对象,因此很容易支持这样的函数式特性。

    partial

    >>> from functools import partial
    
    >>> basetwo = partial(int, base=2)
    
    >>> basetwo('10010')
    18

    basetwo('10010')实际上等价于调用int('10010', base=2),当函数的参数个数太多的时候,可以通过使用functools.partial来创建一个新的函数来简化逻辑从而增强代码的可读性,而partial内部实际上就是通过一个简单的闭包来实现的。

    def partial(func, *args, **keywords):
     def newfunc(*fargs, **fkeywords):
     newkeywords = keywords.copy()
     newkeywords.update(fkeywords)
     return func(*args, *fargs, **newkeywords)
     newfunc.func = func
     newfunc.args = args
     newfunc.keywords = keywords
     return newfunc

    partialmethod

    partialmethod和partial类似,但是对于绑定一个非对象自身的方法的时候,这个时候就只能使用partialmethod了,我们通过下面这个例子来看一下两者的差异。

    from functools import partial, partialmethod
    
    
    def standalone(self, a=1, b=2):
     "Standalone function"
     print(' called standalone with:', (self, a, b))
     if self is not None:
     print(' self.attr =', self.attr)
    
    
    class MyClass:
     "Demonstration class for functools"
     def __init__(self):
     self.attr = 'instance attribute'
     method1 = functools.partialmethod(standalone) # 使用partialmethod
     method2 = functools.partial(standalone) # 使用partial
    >>> o = MyClass()
    
    >>> o.method1()
     called standalone with: (<__main__.MyClass object at 0x7f46d40cc550>, 1, 2)
     self.attr = instance attribute
    
    # 不能使用partial
    >>> o.method2()
    Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
    TypeError: standalone() missing 1 required positional argument: 'self'

    singledispatch

    虽然Python不支持同名方法允许有不同的参数类型,但是我们可以借用singledispatch来动态指定相应的方法所接收的参数类型,而不用把参数判断放到方法内部去判断从而降低代码的可读性。

    from functools import singledispatch
    
    
    class TestClass(object):
     @singledispatch
     def test_method(arg, verbose=False):
     if verbose:
     print("Let me just say,", end=" ")
     print(arg)
    
     @test_method.register(int)
     def _(arg):
     print("Strength in numbers, eh?", end=" ")
     print(arg)
    
     @test_method.register(list)
     def _(arg):
     print("Enumerate this:")
    
     for i, elem in enumerate(arg):
     print(i, elem)

    下面通过@test_method.register(int)和@test_method.register(list)指定当test_method的第一个参数为int或者list的时候,分别调用不同的方法来进行处理。

    >>> TestClass.test_method(55555) # call @test_method.register(int)
    Strength in numbers, eh? 55555
    
    >>> TestClass.test_method([33, 22, 11]) # call @test_method.register(list)
    Enumerate this:
    0 33
    1 22
    2 11
    
    >>> TestClass.test_method('hello world', verbose=True) # call default
    Let me just say, hello world

    wraps

    装饰器会遗失被装饰函数的__name__和__doc__等属性,可以使用@wraps来恢复。

    from functools import wraps
    
    
    def my_decorator(f):
     @wraps(f)
     def wrapper():
     """wrapper_doc"""
     print('Calling decorated function')
     return f()
     return wrapper
    
    
    @my_decorator
    def example():
     """example_doc"""
     print('Called example function')
    >>> example.__name__
    'example'
    >>> example.__doc__
    'example_doc'
    
    # 尝试去掉@wraps(f)来看一下运行结果,example自身的__name__和__doc__都已经丧失了
    >>> example.__name__
    'wrapper'
    >>> example.__doc__
    'wrapper_doc'

    我们也可以使用update_wrapper来改写

    from itertools import update_wrapper
    
    
    def g():
     ...
    g = update_wrapper(g, f)
    
    
    # equal to
    @wraps(f)
    def g():
     ...

    @wraps内部实际上就是基于update_wrapper来实现的。

    def wraps(wrapped, assigned=WRAPPER_ASSIGNMENTS, updated=WRAPPER_UPDATES):
     def decorator(wrapper):
     return update_wrapper(wrapper, wrapped=wrapped...)
     return decorator

    lru_cache

    lru_cache和singledispatch是开发中应用非常广泛的黑魔法,接下来我们来看一下lru_cache。对于重复的计算性任务,使用缓存加速是非常重要的,下面我们通过一个fibonacci的例子来看一下使用lru_cache与不使用lru_cache在速度上的差异。

    # clockdeco.py
    
    import time
    import functools
    
    
    def clock(func):
     @functools.wraps(func)
     def clocked(*args, **kwargs):
     t0 = time.time()
     result = func(*args, **kwargs)
     elapsed = time.time() - t0
     name = func.__name__
     arg_lst = []
     if args:
     arg_lst.append(', '.join(repr(arg) for arg in args))
     if kwargs:
     pairs = ['%s=%r' % (k, w) for k, w in sorted(kwargs.items())]
     arg_lst.append(', '.join(pairs))
     arg_str = ', '.join(arg_lst)
     print('[%0.8fs] %s(%s) -> %r ' % (elapsed, name, arg_str, result))
     return result
     return clocked

    不使用lru_cache

    from clockdeco import clock
    
    
    @clock
    def fibonacci(n):
     if n < 2:
     return n
     return fibonacci(n-2) + fibonacci(n-1)
    
    
    if __name__=='__main__':
     print(fibonacci(6))

    下面是运行结果,从运行结果可以看出fibonacci(n)会在递归的时候被重复计算,这是非常耗时消费资源的。

    [0.00000119s] fibonacci(0) -> 0 
    [0.00000143s] fibonacci(1) -> 1 
    [0.00021172s] fibonacci(2) -> 1 
    [0.00000072s] fibonacci(1) -> 1 
    [0.00000095s] fibonacci(0) -> 0 
    [0.00000095s] fibonacci(1) -> 1 
    [0.00011444s] fibonacci(2) -> 1 
    [0.00022793s] fibonacci(3) -> 2 
    [0.00055265s] fibonacci(4) -> 3 
    [0.00000072s] fibonacci(1) -> 1 
    [0.00000072s] fibonacci(0) -> 0 
    [0.00000095s] fibonacci(1) -> 1 
    [0.00011158s] fibonacci(2) -> 1 
    [0.00022268s] fibonacci(3) -> 2 
    [0.00000095s] fibonacci(0) -> 0 
    [0.00000095s] fibonacci(1) -> 1 
    [0.00011349s] fibonacci(2) -> 1 
    [0.00000072s] fibonacci(1) -> 1 
    [0.00000095s] fibonacci(0) -> 0 
    [0.00000095s] fibonacci(1) -> 1 
    [0.00010705s] fibonacci(2) -> 1 
    [0.00021267s] fibonacci(3) -> 2 
    [0.00043225s] fibonacci(4) -> 3 
    [0.00076509s] fibonacci(5) -> 5 
    [0.00142813s] fibonacci(6) -> 8 
    8

    使用lru_cache

    import functools
    from clockdeco import clock
    
    
    @functools.lru_cache() # 1
    @clock # 2
    def fibonacci(n):
     if n < 2:
     return n
     return fibonacci(n-2) + fibonacci(n-1)
    
    if __name__=='__main__':
     print(fibonacci(6))

    下面是运行结果,对于已经计算出来的结果将其放入缓存。

    [0.00000095s] fibonacci(0) -> 0 
    [0.00005770s] fibonacci(1) -> 1 
    [0.00015855s] fibonacci(2) -> 1 
    [0.00000286s] fibonacci(3) -> 2 
    [0.00021124s] fibonacci(4) -> 3 
    [0.00000191s] fibonacci(5) -> 5 
    [0.00024652s] fibonacci(6) -> 8 
    8

    上面我们选用的数字还不够大,感兴趣的朋友不妨自己选择一个较大的数字比较一下两者在速度上的差异

    total_ordering

    Python2中可以通过自定义__cmp__的返回值0/-1/1来比较对象的大小,在Python3中废弃了__cmp__,但是我们可以通过total_ordering然后修改 __lt__() , __le__() , __gt__(), __ge__(), __eq__(), __ne__() 等魔术方法来自定义类的比较规则。p.s: 如果使用必须在类里面定义 __lt__() , __le__() , __gt__(), __ge__()中的一个,以及给类添加一个__eq__() 方法。

    import functools
    
    
    @functools.total_ordering
    class MyObject:
     def __init__(self, val):
     self.val = val
    
     def __eq__(self, other):
     print(' testing __eq__({}, {})'.format(
     self.val, other.val))
     return self.val == other.val
    
     def __gt__(self, other):
     print(' testing __gt__({}, {})'.format(
     self.val, other.val))
     return self.val > other.val
    
    
    a = MyObject(1)
    b = MyObject(2)
    
    for expr in ['a < b', 'a <= b', 'a == b', 'a >= b', 'a > b']:
     print('
    {:<6}:'.format(expr))
     result = eval(expr)
     print(' result of {}: {}'.format(expr, result))

    下面是运行结果:

    a < b :
     testing __gt__(1, 2)
     testing __eq__(1, 2)
     result of a < b: True
    
    a <= b:
     testing __gt__(1, 2)
     result of a <= b: True
    
    a == b:
     testing __eq__(1, 2)
     result of a == b: False
    
    a >= b:
     testing __gt__(1, 2)
     testing __eq__(1, 2)
     result of a >= b: False
    
    a > b :
     testing __gt__(1, 2)
     result of a > b: False

    itertools的使用

    itertools为我们提供了非常有用的用于操作迭代对象的函数。

    无限迭代器

    count

    count(start=0, step=1) 会返回一个无限的整数iterator,每次增加1。可以选择提供起始编号,默认为0。

    >>> from itertools import count
    
    >>> for i in zip(count(1), ['a', 'b', 'c']):
    ... print(i, end=' ')
    ...
    (1, 'a') (2, 'b') (3, 'c')

    cycle

    cycle(iterable) 会把传入的一个序列无限重复下去,不过可以提供第二个参数就可以制定重复次数。

    >>> from itertools import cycle
    
    >>> for i in zip(range(6), cycle(['a', 'b', 'c'])):
    ... print(i, end=' ')
    ...
    (0, 'a') (1, 'b') (2, 'c') (3, 'a') (4, 'b') (5, 'c')

    repeat

    repeat(object[, times]) 返回一个元素无限重复下去的iterator,可以提供第二个参数就可以限定重复次数。

    >>> from itertools import repeat
    
    >>> for i, s in zip(count(1), repeat('over-and-over', 5)):
    ... print(i, s)
    ...
    1 over-and-over
    2 over-and-over
    3 over-and-over
    4 over-and-over
    5 over-and-over

    Iterators terminating on the shortest input sequence

    accumulate

    accumulate(iterable[, func])

    >>> from itertools import accumulate
    >>> import operator
    
    >>> list(accumulate([1, 2, 3, 4, 5], operator.add))
    [1, 3, 6, 10, 15]
    
    >>> list(accumulate([1, 2, 3, 4, 5], operator.mul))
    [1, 2, 6, 24, 120]

    chain

    itertools.chain(*iterables)可以将多个iterable组合成一个iterator

    >>> from itertools import chain
    
    >>> list(chain([1, 2, 3], ['a', 'b', 'c']))
    [1, 2, 3, 'a', 'b', 'c']

    chain的实现原理如下

    def chain(*iterables):
     # chain('ABC', 'DEF') --> A B C D E F
     for it in iterables:
     for element in it:
     yield element

    chain.from_iterable

    chain.from_iterable(iterable)和chain类似,但是只是接收单个iterable,然后将这个iterable中的元素组合成一个iterator。

    >>> from itertools import chain
    
    >>> list(chain.from_iterable(['ABC', 'DEF']))
    ['A', 'B', 'C', 'D', 'E', 'F']

    实现原理也和chain类似

    def from_iterable(iterables):
     # chain.from_iterable(['ABC', 'DEF']) --> A B C D E F
     for it in iterables:
     for element in it:
     yield element

    compress

    compress(data, selectors)接收两个iterable作为参数,只返回selectors中对应的元素为True的data,当data/selectors之一用尽时停止。

    >>> list(compress([1, 2, 3, 4, 5], [True, True, False, False, True]))
    [1, 2, 5]

    zip_longest

    zip_longest(*iterables, fillvalue=None)和zip类似,但是zip的缺陷是iterable中的某一个元素被遍历完,整个遍历都会停止,具体差异请看下面这个例子

    from itertools import zip_longest
    
    r1 = range(3)
    r2 = range(2)
    
    print('zip stops early:')
    print(list(zip(r1, r2)))
    
    r1 = range(3)
    r2 = range(2)
    
    print('
    zip_longest processes all of the values:')
    print(list(zip_longest(r1, r2)))

    下面是输出结果

    zip stops early:
    [(0, 0), (1, 1)]
    
    zip_longest processes all of the values:
    [(0, 0), (1, 1), (2, None)]

    islice

    islice(iterable, stop) or islice(iterable, start, stop[, step]) 与Python的字符串和列表切片有一些类似,只是不能对start、start和step使用负值。

    >>> from itertools import islice
    
    >>> for i in islice(range(100), 0, 100, 10):
    ... print(i, end=' ')
    ...
    0 10 20 30 40 50 60 70 80 90

    tee

    tee(iterable, n=2) 返回n个的iterator,n默认为2。

    from itertools import islice, tee
    
    r = islice(count(), 5)
    i1, i2 = tee(r)
    
    print('i1:', list(i1))
    print('i2:', list(i2))
    
    for i in r:
     print(i, end=' ')
     if i > 1:
     break

    下面是输出结果,注意tee(r)后,r作为iterator已经失效,所以for循环没有输出值。

    i1: [0, 1, 2, 3, 4]
    i2: [0, 1, 2, 3, 4]

    starmap

    starmap(func, iterable)假设iterable将返回一个元组流,并使用这些元组作为参数调用func:

    >>> from itertools import starmap
    >>> import os
    
    >>> iterator = starmap(os.path.join,
    ... [('/bin', 'python'), ('/usr', 'bin', 'java'),
    ... ('/usr', 'bin', 'perl'), ('/usr', 'bin', 'ruby')])
    
    >>> list(iterator)
    ['/bin/python', '/usr/bin/java', '/usr/bin/perl', '/usr/bin/ruby']

    filterfalse

    filterfalse(predicate, iterable) 与filter()相反,返回所有predicate返回False的元素。

    itertools.filterfalse(is_even, itertools.count()) =>
    1, 3, 5, 7, 9, 11, 13, 15, ...

    takewhile

    takewhile(predicate, iterable) 只要predicate返回True,不停地返回iterable中的元素。一旦predicate返回False,iteration将结束。

    def less_than_10(x):
     return x < 10
    
    itertools.takewhile(less_than_10, itertools.count())
    => 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
    
    itertools.takewhile(is_even, itertools.count())
    => 0

    dropwhile

    dropwhile(predicate, iterable) 在predicate返回True时舍弃元素,然后返回其余迭代结果。

    itertools.dropwhile(less_than_10, itertools.count())
    => 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ...
    
    itertools.dropwhile(is_even, itertools.count())
    => 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ...

    groupby

    groupby(iterable, key=None) 把iterator中相邻的重复元素挑出来放在一起。p.s: The input sequence needs to be sorted on the key value in order for the groupings to work out as expected.

  • [k for k, g in groupby('AAAABBBCCDAABBB')] --> A B C D A B

  • [list(g) for k, g in groupby('AAAABBBCCD')] --> AAAA BBB CC D

  • >>> import itertools
    
    >>> for key, group in itertools.groupby('AAAABBBCCDAABBB'):
    ... print(key, list(group))
    ...
    A ['A', 'A', 'A', 'A']
    B ['B', 'B', 'B']
    C ['C', 'C']
    D ['D']
    A ['A', 'A']
    B ['B', 'B', 'B']
    city_list = [('Decatur', 'AL'), ('Huntsville', 'AL'), ('Selma', 'AL'),
     ('Anchorage', 'AK'), ('Nome', 'AK'),
     ('Flagstaff', 'AZ'), ('Phoenix', 'AZ'), ('Tucson', 'AZ'),
     ...
     ]
    
    def get_state(city_state):
     return city_state[1]
    
    itertools.groupby(city_list, get_state) =>
     ('AL', iterator-1),
     ('AK', iterator-2),
     ('AZ', iterator-3), ...
    
    iterator-1 => ('Decatur', 'AL'), ('Huntsville', 'AL'), ('Selma', 'AL')
    iterator-2 => ('Anchorage', 'AK'), ('Nome', 'AK')
    iterator-3 => ('Flagstaff', 'AZ'), ('Phoenix', 'AZ'), ('Tucson', 'AZ')

    Combinatoric generators

    product

    product(*iterables, repeat=1)

  • product(A, B) returns the same as ((x,y) for x in A for y in B)

  • product(A, repeat=4) means the same as product(A, A, A, A)

  • from itertools import product
    
    
    def show(iterable):
     for i, item in enumerate(iterable, 1):
     print(item, end=' ')
     if (i % 3) == 0:
     print()
     print()
    
    
    print('Repeat 2:
    ')
    show(product(range(3), repeat=2))
    
    print('Repeat 3:
    ')
    show(product(range(3), repeat=3))
    Repeat 2:
    
    (0, 0) (0, 1) (0, 2)
    (1, 0) (1, 1) (1, 2)
    (2, 0) (2, 1) (2, 2)
    
    Repeat 3:
    
    (0, 0, 0) (0, 0, 1) (0, 0, 2)
    (0, 1, 0) (0, 1, 1) (0, 1, 2)
    (0, 2, 0) (0, 2, 1) (0, 2, 2)
    (1, 0, 0) (1, 0, 1) (1, 0, 2)
    (1, 1, 0) (1, 1, 1) (1, 1, 2)
    (1, 2, 0) (1, 2, 1) (1, 2, 2)
    (2, 0, 0) (2, 0, 1) (2, 0, 2)
    (2, 1, 0) (2, 1, 1) (2, 1, 2)
    (2, 2, 0) (2, 2, 1) (2, 2, 2)

    permutations

    permutations(iterable, r=None)返回长度为r的所有可能的组合。

    from itertools import permutations
    
    
    def show(iterable):
     first = None
     for i, item in enumerate(iterable, 1):
     if first != item[0]:
     if first is not None:
     print()
     first = item[0]
     print(''.join(item), end=' ')
     print()
    
    
    print('All permutations:
    ')
    show(permutations('abcd'))
    
    print('
    Pairs:
    ')
    show(permutations('abcd', r=2))

    下面是输出结果

    All permutations:
    
    abcd abdc acbd acdb adbc adcb
    bacd badc bcad bcda bdac bdca
    cabd cadb cbad cbda cdab cdba
    dabc dacb dbac dbca dcab dcba
    
    Pairs:
    
    ab ac ad
    ba bc bd
    ca cb cd
    da db dc

    combinations

    combinations(iterable, r) 返回一个iterator,提供iterable中所有元素可能组合的r元组。每个元组中的元素保持与iterable返回的顺序相同。下面的实例中,不同于上面的permutations,a总是在bcd之前,b总是在cd之前,c总是在d之前。

    from itertools import combinations
    
    
    def show(iterable):
     first = None
     for i, item in enumerate(iterable, 1):
     if first != item[0]:
     if first is not None:
     print()
     first = item[0]
     print(''.join(item), end=' ')
     print()
    
    
    print('Unique pairs:
    ')
    show(combinations('abcd', r=2))

    下面是输出结果

    Unique pairs:
    
    ab ac ad
    bc bd
    cd

    combinations_with_replacement

    combinations_with_replacement(iterable, r)函数放宽了一个不同的约束:元素可以在单个元组中重复,即可以出现aa/bb/cc/dd等组合。

    from itertools import combinations_with_replacement
    
    
    def show(iterable):
     first = None
     for i, item in enumerate(iterable, 1):
     if first != item[0]:
     if first is not None:
     print()
     first = item[0]
     print(''.join(item), end=' ')
     print()
    
    
    print('Unique pairs:
    ')
    show(combinations_with_replacement('abcd', r=2))

    下面是输出结果

    aa ab ac ad
    bb bc bd
    cc cd
    dd

    operator的使用

    attrgetter

    operator.attrgetter(attr)和operator.attrgetter(*attrs)

  • After f = attrgetter('name'), the call f(b) returns b.name.

  • After f = attrgetter('name', 'date'), the call f(b) returns (b.name, b.date).

  • After f = attrgetter('name.first', 'name.last'), the call f(b) returns (b.name.first, b.name.last).

  • 我们通过下面这个例子来了解一下itergetter的用法。

    >>> class Student:
    ... def __init__(self, name, grade, age):
    ... self.name = name
    ... self.grade = grade
    ... self.age = age
    ... def __repr__(self):
    ... return repr((self.name, self.grade, self.age))
    
    >>> student_objects = [
    ... Student('john', 'A', 15),
    ... Student('jane', 'B', 12),
    ... Student('dave', 'B', 10),
    ... ]
    
    >>> sorted(student_objects, key=lambda student: student.age) # 传统的lambda做法
    [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
    
    >>> from operator import itemgetter, attrgetter
    
    >>> sorted(student_objects, key=attrgetter('age'))
    [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
    
    # 但是如果像下面这样接受双重比较,Python脆弱的lambda就不适用了
    >>> sorted(student_objects, key=attrgetter('grade', 'age'))
    [('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)]

    attrgetter的实现原理:

    def attrgetter(*items):
     if any(not isinstance(item, str) for item in items):
     raise TypeError('attribute name must be a string')
     if len(items) == 1:
     attr = items[0]
     def g(obj):
     return resolve_attr(obj, attr)
     else:
     def g(obj):
     return tuple(resolve_attr(obj, attr) for attr in items)
     return g
    
    def resolve_attr(obj, attr):
     for name in attr.split("."):
     obj = getattr(obj, name)
     return obj

    itemgetter

    operator.itemgetter(item)和operator.itemgetter(*items)

  • After f = itemgetter(2), the call f(r) returns r[2].

  • After g = itemgetter(2, 5, 3), the call g(r) returns (r[2], r[5], r[3]).

  • 我们通过下面这个例子来了解一下itergetter的用法

    >>> student_tuples = [
    ... ('john', 'A', 15),
    ... ('jane', 'B', 12),
    ... ('dave', 'B', 10),
    ... ]
    
    >>> sorted(student_tuples, key=lambda student: student[2]) # 传统的lambda做法
    [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
    
    >>> from operator import attrgetter
    
    >>> sorted(student_tuples, key=itemgetter(2))
    [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
    
    # 但是如果像下面这样接受双重比较,Python脆弱的lambda就不适用了
    >>> sorted(student_tuples, key=itemgetter(1,2))
    [('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)]

    itemgetter的实现原理

    def itemgetter(*items):
     if len(items) == 1:
     item = items[0]
     def g(obj):
     return obj[item]
     else:
     def g(obj):
     return tuple(obj[item] for item in items)
     return g

    methodcaller

    operator.methodcaller(name[, args...])

  • After f = methodcaller('name'), the call f(b) returns b.name().

  • After f = methodcaller('name', 'foo', bar=1), the call f(b) returns b.name('foo', bar=1).

  • methodcaller的实现原理

    def methodcaller(name, *args, **kwargs):
     def caller(obj):
     return getattr(obj, name)(*args, **kwargs)
     return caller

    更多Python标准库之functools/itertools/operator相关文章请关注PHP中文网!

    下载本文
    显示全文
    专题