2 for in list comprehension

List comprehensions are useful and can help you write elegant code that’s easy to read and debug, but they’re not the right choice for all circumstances. They might make your code run more slowly or use more memory. If your code is less performant or harder to understand, then it’s probably better to choose an alternative.

Comprehensions can be nested to create combinations of lists, dictionaries, and sets within a collection. For example, say a climate laboratory is tracking the high temperature in five different cities for the first week of June. The perfect data structure for storing this data could be a Python list comprehension nested within a dictionary comprehension:

>>>>>> cities = ['Austin', 'Tacoma', 'Topeka', 'Sacramento', 'Charlotte'] >>> temps = {city: [0 for _ in range[7]] for city in cities} >>> temps { 'Austin': [0, 0, 0, 0, 0, 0, 0], 'Tacoma': [0, 0, 0, 0, 0, 0, 0], 'Topeka': [0, 0, 0, 0, 0, 0, 0], 'Sacramento': [0, 0, 0, 0, 0, 0, 0], 'Charlotte': [0, 0, 0, 0, 0, 0, 0] }

You create the outer collection temps with a dictionary comprehension. The expression is a key-value pair, which contains yet another comprehension. This code will quickly generate a list of data for each city in cities.

Nested lists are a common way to create matrices, which are often used for mathematical purposes. Take a look at the code block below:

>>>>>> matrix = [[i for i in range[5]] for _ in range[6]] >>> matrix [ [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4] ]

The outer list comprehension [... for _ in range[6]] creates six rows, while the inner list comprehension [i for i in range[5]] fills each of these rows with values.

So far, the purpose of each nested comprehension is pretty intuitive. However, there are other situations, such as flattening nested lists, where the logic arguably makes your code more confusing. Take this example, which uses a nested list comprehension to flatten a matrix:

>>>matrix = [ ... [0, 0, 0], ... [1, 1, 1], ... [2, 2, 2], ... ] >>> flat = [num for row in matrix for num in row] >>> flat [0, 0, 0, 1, 1, 1, 2, 2, 2]

The code to flatten the matrix is concise, but it may not be so intuitive to understand how it works. On the other hand, if you were to use for loops to flatten the same matrix, then your code will be much more straightforward:

>>>>>> matrix = [ ... [0, 0, 0], ... [1, 1, 1], ... [2, 2, 2], ... ] >>> flat = [] >>> for row in matrix: ... for num in row: ... flat.append[num] ... >>> flat [0, 0, 0, 1, 1, 1, 2, 2, 2]

Now you can see that the code traverses one row of the matrix at a time, pulling out all the elements in that row before moving on to the next one.

While the single-line nested list comprehension might seem more Pythonic, what’s most important is to write code that your team can easily understand and modify. When you choose your approach, you’ll have to make a judgment call based on whether you think the comprehension helps or hurts readability.

A list comprehension in Python works by loading the entire output list into memory. For small or even medium-sized lists, this is generally fine. If you want to sum the squares of the first one-thousand integers, then a list comprehension will solve this problem admirably:

>>>>>> sum[[i * i for i in range[1000]]] 332833500

But what if you wanted to sum the squares of the first billion integers? If you tried then on your machine, then you may notice that your computer becomes non-responsive. That’s because Python is trying to create a list with one billion integers, which consumes more memory than your computer would like. Your computer may not have the resources it needs to generate an enormous list and store it in memory. If you try to do it anyway, then your machine could slow down or even crash.

When the size of a list becomes problematic, it’s often helpful to use a generator instead of a list comprehension in Python. A generator doesn’t create a single, large data structure in memory, but instead returns an iterable. Your code can ask for the next value from the iterable as many times as necessary or until you’ve reached the end of your sequence, while only storing a single value at a time.

If you were to sum the first billion squares with a generator, then your program will likely run for a while, but it shouldn’t cause your computer to freeze. The example below uses a generator:

>>>>>> sum[i * i for i in range[1000000000]] 333333332833333333500000000

You can tell this is a generator because the expression isn’t surrounded by brackets or curly braces. Optionally, generators can be surrounded by parentheses.

The example above still requires a lot of work, but it performs the operations lazily. Because of lazy evaluation, values are only calculated when they’re explicitly requested. After the generator yields a value [for example, 567 * 567], it can add that value to the running sum, then discard that value and generate the next value [568 * 568]. When the sum function requests the next value, the cycle starts over. This process keeps the memory footprint small.

map[] also operates lazily, meaning memory won’t be an issue if you choose to use it in this case:

>>>>>> sum[map[lambda i: i*i, range[1000000000]]] 333333332833333333500000000

It’s up to you whether you prefer the generator expression or map[].

So, which approach is faster? Should you use list comprehensions or one of their alternatives? Rather than adhere to a single rule that’s true in all cases, it’s more useful to ask yourself whether or not performance matters in your specific circumstance. If not, then it’s usually best to choose whatever approach leads to the cleanest code!

If you’re in a scenario where performance is important, then it’s typically best to profile different approaches and listen to the data. timeit is a useful library for timing how long it takes chunks of code to run. You can use timeit to compare the runtime of map[], for loops, and list comprehensions:

>>>>>> import random >>> import timeit >>> TAX_RATE = .08 >>> txns = [random.randrange[100] for _ in range[100000]] >>> def get_price[txn]: ... return txn * [1 + TAX_RATE] ... >>> def get_prices_with_map[]: ... return list[map[get_price, txns]] ... >>> def get_prices_with_comprehension[]: ... return [get_price[txn] for txn in txns] ... >>> def get_prices_with_loop[]: ... prices = [] ... for txn in txns: ... prices.append[get_price[txn]] ... return prices ... >>> timeit.timeit[get_prices_with_map, number=100] 2.0554370979998566 >>> timeit.timeit[get_prices_with_comprehension, number=100] 2.3982384680002724 >>> timeit.timeit[get_prices_with_loop, number=100] 3.0531821520007725

Here, you define three methods that each use a different approach for creating a list. Then, you tell timeit to run each of those functions 100 times each. timeit returns the total time it took to run those 100 executions.

As the code demonstrates, the biggest difference is between the loop-based approach and map[], with the loop taking 50% longer to execute. Whether or not this matters depends on the needs of your application.

Video liên quan

Chủ Đề