Python: Comparing performance of using .join() and '+' for string concatenation

·

2 min read

As you know strings are immutable and concatenating them is an expensive affair especially if the number of concatenations is large. In this article, we compare the efficiency of two different techniques for concatenating strings: using + and using .join()

We want to perform the comparison using a sufficiently large number of string concatenations. Hence, let's say we want to concatenate 'Hello' 100000 times. We will write two routines for string concatenation using the two different techniques mentioned above.


from time import perf_counter

def get_time_to_concatenate_using_join(no_of_concatenations):
    start = perf_counter()
    result = ''.join("Hello" for i in range(0, no_of_concatenations))
    return perf_counter() - start

def get_time_to_concatenate_using_plus(no_of_concatenations):
    start = perf_counter()
    result = ""
    for i in range(0, no_of_concatenations):
        result += "Hello"
    return perf_counter() - start

total_time = 0
for i in range(0, 10):
    total_time += get_time_to_concatenate_using_join(100000)
print("Average time taken in ms using .join():", round(total_time/10 * 1000, 2))

total_time = 0
for i in range(0, 10):
    total_time += get_time_to_concatenate_using_plus(100000)
print("Average time taken in ms using '+':", round(total_time/10 * 1000, 2))

# Output:
# Average time taken in ms using .join(): 6.03
# Average time taken in ms using '+': 22.24

We have taken the average of 10 runs of execution for each of the two routines. From the results, we can observe that, on average, the time taken using + (note that += is equivalent to using +) operator is 268% (calculation = (22.24 - 6.03)/6.03 * 100) more than the time taken using .join().

Let's reduce the number of concatenations to 1000 which is more practical.

total_time = 0
for i in range(0, 10):
    total_time += get_time_to_concatenate_using_join(1000)
print("Average time taken in ms using .join():", round(total_time/10 * 1000, 2))

total_time = 0
for i in range(0, 10):
    total_time += get_time_to_concatenate_using_plus(1000)
print("Average time taken in ms:", round(total_time/10 * 1000, 2))

# Output:
# Average time taken in ms using .join(): 0.04
# Average time taken in ms using '+': 0.12

In this case, we can observe that, on average, the time taken using the + operator is 200% more than the time taken using .join().

Takeaway:

Using .join() is more efficient for string concatenations as compared to the + operator. This is because .join() creates a single new string object by joining the elements from the input iterable with the selected separator string instead of creating new intermediate strings in each iteration

Thanks for reading and I hope you found this article useful!