Data Scientist - Benjamin Tovar

The tale of two algorithms, importance of algorithm analysis in our daily programming

19

Mar

 

The tale of two algorithms, importance of algorithm analysis in our daily programming tasks

 

Introduction

Quoting Wikipedia: “In computer science, the analysis of algorithms is the determination of the amount of resources (such as time and storage) necessary to execute them. Most algorithms are designed to work with inputs of arbitrary length. Usually, the efficiency or running time of an algorithm is stated as a function relating the input length to the number of steps (time complexity) or storage locations (space complexity).”

“Algorithm analysis is an important part of a broader computational complexity theory, which provides theoretical estimates for the resources needed by any algorithm which solves a given computational problem. These estimates provide an insight into reasonable directions of search for efficient algorithms.”

This leads us to another important concept, quoting Victor S.Adamchik, CMU, 2009:

Algorithmic complexity is concerned about how fast or slow particular algorithm performs. We define complexity as a numerical function T(n) – time versus the input size n. We want to define time taken by an algorithm without depending on the implementation details. But you agree that T(n) does depend on the implementation! A given algorithm will take different amounts of time on the same inputs depending on such factors as: processor speed; instruction set, disk speed, brand of compiler and etc. The way around is to estimate efficiency of each algorithm asymptotically. We will measure time T(n) as the number of elementary “steps” (defined in any way), provided each such step takes constant time.

Let us consider two classical examples: addition of two integers. We will add two integers digit by digit (or bit by bit), and this will define a “step” in our computational model. Therefore, we say that addition of two n-bit integers takes n steps. Consequently, the total computational time is T(n) = c * n, where c is time taken by addition of two bits. On different computers, addition of two bits might take different time, say c1 and c2, thus the addition of two n-bit integers takes T(n) = c1 * n and T(n) = c2* n respectively. This shows that different machines result in different slopes, but time T(n) grows linearly as input size increases.”

 

So, for this post, I will write about the importance of writing efficient code (memory and computations). For this, lets set up the experiment.

 

Methods

Assume we have two algorithms that performs the same task (say, sort an array of numbers), f1 and f2, with their complexities explained below:

Complexity of f1 algorithm:

eqq1

Complexity of f1 algorithm (Asymptotic notation):

eqq3

Complexity of f2 algorithm:

eqq2

Complexity of f2 algorithm (Asymptotic notation):

eqq4

Here we can see that f1 has a quadratic complexity  and f2 has a linear complexity.

Now, to make thinks way more interesting, say that we also have two computers, one is a supercomputer that can perform 1e10 calculations per second and a home computer that performs 1e6 calculations per second. Therefore, the supercomputer is 10,000 faster than the home computer!

Supercomputer will be using algorithm f1 and the home computer will be using algorithm f2.

Results

Counting the number of operations performed by each algorithm:

When n=500 we observe that algorithm f1 outperforms by making less operations than the algorithm f2.

im1When n=1000 we observe that for this case, algorithm f2 is outperforming algorithm f1. Therefore, we can suggest that algorithm f1 may be a better implementation for small values of n (say, for example for values of n<=600).

im2When n=10000 we observe that algorithm f1 is making way more operations that algorithm f2. This result suggest that algorithm f1 is not suited for large input sizes. For this test, algorithm f2 is the winner!

im3

Comparing computer performances:

Remember that the supercomputer will be using algorithm f1 and the home computer will be using algorithm f2. Supercomputer is 10,000 faster than the home computer.

When n=1e7 we observe that supercomputer (running algorithm f1) outperforms by taking less hours to complete the task.

im4When n=1e8 we observe that supercomputer (running algorithm f1) is taking approximately 600 hours to complete and the home computer  (running algorithm f2) is taking approximately 100 hours.

im5

When n=1e10 we observe that supercomputer (running algorithm f1) is taking approximately 600 years to complete and the home computer  (running algorithm f2) is taking approximately 1.5 years.

im6

Conclusions

We observed that the supercomputer (regardless of being 10,000 faster than the home computer) using an inefficient implementation of an algorithm, is easily outperformed by a simple $500.00 USD home computer.

So, regardless of how much CPU power and Gigs of RAM memory you have in your supercomputer, if the algorithm is coded poorly (not optimized), your hardware will not help much.

That’s my advice and my suggestion to your daily programming, I personally think that the current hardware capabilities we have at out disposition is not a excuse to write hungry-redundant code (in memory, and in computations respectively). I remember that my  home computer (back in 2000) had 8gb on hard drive space, 256mb of RAM and a 0.1 GHz processor.  Then, please do not write code that eats 60Gb of RAM to open a *.doc file

Code and files

You can download the code clicking here

twittergoogle_plusredditlinkedin

Tags: , ,


Leave a comment
 

Your email address will not be published. Required fields are marked. *