Python Concatenate Lists

Concatenate Two Lists in Python

Problem: You have two lists and you'd like to join them into a new list.
Solution:

Python 3.8.2
>>> one = ["one","two", "three"]
>>> two = ["four","five"]
>>> one + two
['one', 'two', 'three', 'four', 'five']

📢 TLDR: Use +

In almost all simple situations, using list1 + list2 is the way you want to concatenate lists.

The edge cases below are better in some situations, but + is generally the best choice. All options covered work in Python 2.3, Python 2.7, and all versions of Python 3¹.

Combine Lists in Place in Python

Problem: You have a huge list, and you want to add a smaller list on the end while minimizing memory usage.

In this case, it may be best to append to the existing list, reusing it instead of recreating a new list.

>>>longlist = ["one","two", "three"] * 1000
['one', 'two', 'three', 'one', 'two', 'three', ... ]
>>>shortlist = ["four","five"]
["four","five"]
>>> x.extend(y)
>>> x
['one', 'two', 'three', 'one',  ..., "four","five"]

As with any optimization, you should verify that this reduces memory thrash in your specific case and stick to the simple idiomatic x + y otherwise.

Let's use the timeit module to check some performance numbers.

# Performance Check
>>> setup = """\
x = ["one","two","three"] * 1000 
y = ["four","five","six"]
"""
# x + y with large x
>>> timeit.timeit('x + y', setup=setup, number=1000000)
3.6260274310000113
# x.extend(y) with large x
>>> timeit.timeit('x.extend(y)', setup=setup, number=1000000)
0.06857255800002804

In this example, where x is 3000 elements, extend is around 50x faster.

❗ Concatenating Lists With Large Elements is Fine

If the elements in your list are huge (million character strings), but the list size is less than a thousand elements, the previous solution x + y will work just fine. This is because Python stores references to the values in the list, not the values themselves. Thus, the element size makes no difference to the runtime complexity.

>>> x = ["one" * 1000, "two" * 1000, "three" * 1000]
>>> y = ["four" * 1000, "five" * 1000]
>>> #This is fine
>>> z = x + y
>>> #Performance Testing (extend is slower for large elements)
>>>  setup = """\
x = ["one" * 1000, "two" * 1000, "three" * 1000]
y = ["four" * 1000, "five" * 1000]
"""
>>> timeit.timeit('x + y', setup=setup, number=1000000)
0.05397573999994165
>>> timeit.timeit('x.extend(y)', setup=setup, number=1000000)
0.06511967799997365

In this case, extend does not have an advantage.

Avoid Chain From `itertools` For Two Lists

It is possible to use chain from itertools to create an iterable of two lists.

>>>longlist = ["one","two", "three"] * 1000
['one', 'two', 'three', 'one', 'two', 'three',, .......... ]
>>>shortlist = ["four","five"]
["four","five"]
>>> from itertools import chain
>>> z = list(chain(longlist, shortlist)
['one', 'two', 'three', 'one', , .........., "four","five"]

We can check the performance of using chain:

>>> setup = """\
from itertools import chain
x = ["one","two","three"] * 1000 
y = ["four","five","six"]
"""
# x + y with large x
# x.extend(y) with large x
>>> timeit.timeit('x.extend(y)', setup=setup, number=1000000)
0.06857255800002804
>>> timeit.timeit('list(chain(x, y))', setup=setup, number=1000000)
16.810488051999982

Using chain with two lists is slower in all cases tested, and x + y is easier to understand.

Combining N Lists in Python

If you need to add three or even ten lists together and the lists are statically known, then + for concatenate works great.

>>> one = ["one","two", "three"]
>>> two = ["four","five"]
>>> three = []
>>> z = one + two + three

Flatten a List of Lists in Python

However, if the number of lists is dynamic and unknown until runtime, chain from itertools becomes a great option. Chain takes a list of lists and flattens it into a single list.

>>> l = [["one","two", "three"],["four","five"],[]] * 99
[['one', 'two', 'three'], ['four', 'five'], [], ...
>>> list(chain.from_iterable(l))
['one', 'two', 'three', 'four', 'five', 'one', 'two', ... ]

chain can take anything iterable, making it an excellent choice for combining lists, dictionaries, and other iterable structures.

>>> from itertools import chain
>>> one = [1,2,3]
>>> two = {1,2,3}
>>> list(chain(one, two, one))
[1, 2, 3, 1, 2, 3, 1, 2, 3]

Performance of Flattening a List of Lists

Performance doesn't always matter, but readability always does, and the chain method is a straightforward way to combine lists of lists. That said, let's put readability aside for a moment and try to find the fastest way to flatten lists.

One option is iterating ourselves:

result = []
for nestedlist in l:
    result.extend(nestedlist)

Let's check its performance vs chain:

>>> setup = """\
from itertools import chain
l = [["one","two", "three"],["four","five"],[]] * 99
"""
>>> # Add Nested Lists using chain.from_iterable
>>> timeit.timeit('list(chain.from_iterable(l))', setup=setup, number=100000)
1.0384087909997106
>>> ### Add using our own iteration
>>> run = """\
result = []
for nestedlist in l:
    result.extend(nestedlist)  
"""
>>> timeit.timeit(run, setup=setup, number=100000)
1.8619721710001613

This shows that chain.from_iterable is faster than extend.

Flattening and Merging Lists With One Big List

What about adding a list of lists to an existing and large list? We saw that using extend can be faster with two lists when one is significantly longer than the other so let's test the performance of extend with N lists.

First, we use our standard chain.from_iterable.

>>> # Method 1 - chain.from_iterable 
>>> longlist = ["one","two", "three"] * 1000
>>> nestedlist = [longlist, ["one","two", "three"],["four","five"],[]]
>>> list(chain.from_iterable(nestedlist))

We then test its performance:

>>> setup = """\
from itertools import chain
longlist = ["one","two", "three"] * 1000;
combinedlist = [longlist, ["one","two", "three"],["four","five"],[]]
"""
>>> timeit.timeit('list(chain.from_iterable(combinedlist))', setup=setup, number=100000)
1.8676087710009597

Next, let's try concatenating by adding everything onto the long list:

>>> # Method 2 - extend
>>>  longlist = ["one","two", "three"] * 1000
>>>  nestedlist = [["one","two", "three"],["four","five"],[]]
>>>  for item in nestedlist:
>>>     longlist.extent(item)

Performance Test:

>>> setup = """\
from itertools import chain
longlist = ["one","two", "three"] * 1000;
nestedlist = [["one","two", "three"],["four","five"],[]]
"""
>>> run = """\
for item in nestedlist:
   longlist.extend(item)
   """
>>> timeit.timeit(run, setup=setup, number=100000) 
0.02403609199973289

There we go, extend is much faster when flattening lists or concatenating many lists with one long list. If you encounter this, using extend to add the smaller lists to the long list can decrease the work that has to be done and increase performance.

Summary

These are the main variants of combining lists in python. Use this table to guide you in the future.

Also, if you are looking for a nice way to standardize the processes around your python projects -- running tests, installing dependencies, and linting code -- take a look at Earthly for Repeatable Builds.

Condition	Solution	Performance Optimization²
2 lists	`x + y`	No
1 large list, 1 small list	`x.extend(y)`	Yes
Known number of N lists	`x + y + z`	No
Unknown number of N lists	`list(chain.from_iterable(l))`	No
List of Lists	`list(chain.from_iterable(l))`	No
1 large list, many small lists	`for l1 in l: x.extend(...)`	Yes

I did all the performance testing using Python 3.9.5 on MacOS BigSur. ↩
If you don't have a performance bottleneck, clarity trumps performance, and you should ignore the performance suggestions. ↩