Memoizing a generator in Python

A place to discuss the implementation and style of computer programs.

Moderators: phlip, Moderators General, Prelates

LucasBrown
Posts: 299
Joined: Thu Apr 15, 2010 2:57 am UTC
Location: Poway, CA

Memoizing a generator in Python

I'd like to memoize a generator in Python. Specifically, I have an incremental sieve of Eratosthenes that gets called from scratch a lot, and I'd like to avoid recomputing (up to some arbitrary finite number of) the numbers it yields. Memoizing a pure function is easy enough, and there are any number of ways to do it, but since this is a generator, and since Python doesn't seem to allow copying generators, things get a bit trickier.

What I want to do is something like the following:
Spoiler:

Code: Select all

`nsprimes = 10**6    # We'll cache this many values.def primegen(limit=None):    ...?for p in primegen():    ...            break    ...# While running that loop we yielded 120 primes from primegen().# The cache of primes now contains 120 values.for p in primegen(17):    ...    # While running this loop, we don't recompute any primes,    # since they've already been cached.    ...for p in primegen(10**8):    ...    # There are over 5,000,000 primes < 10**8.  So while    # running this loop, we produce the first 120 primes    # quickly, since they're already cached, then compute    # and cache the next 998,980 primes, and then    # compute but don't cache the remaining primes < 10**8.    ...for p in primegen(10**6):    ...    # This time, we don't recompute anything, since    # there are < 1,000,000 primes under 1,000,000,    # and since these values are already cached.    ...for p in primegen(10**8):    ...    # Again, we don't recompute the first million primes    # since they've already been cached, but since the loop    # will cycle over more than 5,000,000 primes, we need    # to recompute the primes after the first million.    ...`

If I had infinite memory, then I wouldn't have to care about the cache size: I'd be able to do

Code: Select all

`def primer():   # Generates the primes.   ...smallprimes = []    # We'll cache the primes here._primer = primer()def primegen(limit=None):    # Generates primes < limit, or all of them if limit == None.    # Caches the results along the way.    p = 0    for p in smallprimes:        if (limit is None) or (p < limit): yield p        else: return    for p in _primer:        smallprimes.append(p)        if (limit is None) or (p < limit): yield p        else: return`

But since memory is a finite resource, I have to worry about the cache size. If we could copy generators, then this would still be a fairly easy thing to do: a solution might resemble

Code: Select all

`def primer():    # Generates the primes.    ...nsprimes = 10**6    # We'll cache this many values.smallprimes = []    # We'll cache the primes here._primer = primer()def primegen(limit=None):    # Generates primes < limit, or all of them if limit == None.    # Caches the first nsprimes results along the way.    p = 0    for p in smallprimes:        if (limit is None) or (p < limit): yield p        else: return    for p in _primer.copy():        smallprimes.append(p)        if (limit is None) or (p < limit): yield p        else: return`

Unfortunately, generators don't seem to be copyable: they don't support the .copy() attribute, using the copy module also throws errors, and I even found a page in the Python docs that indicates that generators can't be copied.

There's this recipe, but that uses a bytecode hack and isn't portable across versions. The pickle module might provide a solution, but that also seems rather hacky to me, and I avoid it on general principles because the pickle module isn't secure.

Is there a "natural" way to do what I want to do?

You, sir, name?
Posts: 6983
Joined: Sun Apr 22, 2007 10:07 am UTC
Location: Chako Paul City
Contact:

Re: Memoizing a generator in Python

Had a look at the solutions here? http://stackoverflow.com/questions/4566 ... -generator

They don't copy the generator, but use itertools.tee to achieve the memoization.
I edit my posts a lot and sometimes the words wrong order words appear in sentences get messed up.