Comparing Python, R and Julia execution time in two contexts:
Numerical Simulation and Classification Task

10 min readDec 12, 2019

Abstract: As a Data Scientist, I have been using mostly Python and R to build algorithms for AI/Machine Learning problems; however, I noticed the emergence of the Julia language in the Data Science community, and wanted to give it a try and see how it performs in terms of execution time. In order to do that, I compared Python, R and Julia execution times in two contexts: a Numerical Simulation (the Buffon’s Needle problem) and Classification Tasks (Random Forest and Naive Bayes) — for the former, Julia massively outperforms the other languages; however, for the latter, there are mixed results, with Python generally being the fastest.

Keywords: Machine Learning, Python, R, Julia, Data Science, Numerical Simulation, Classification Task

Introduction

For this exercise, I compared performance (regarding strictly execution time) between three languages doing the same tasks. The study is divided in two parts: in the first (in which Fortran language was also included), a Numerical Simulation context, the famous Buffon’s Needle [1] problem was used (more on that in a bit). In the second, a Classification Task context, I trained Random Forest [2] and Naive Bayes [3] models to a variety of custom-made datasets, analyzing the impact of number of features and number of records (once again, caring only about execution time, not metrics like accuracy or AUC [4]).

The main objective was to analyze if it was worth switching to Julia, a new, daring language hoping to overtake Python and R in the field of Machine Learning and Data Science, offering similarly easy syntax, but with a better performance.

Buffon’s Needle

Buffon’s Needle is a mathematical problem proposed in the 18th century by Georges-Louis Leclerc, Comte de Buffon, a French polymath. In his own words: “Suppose we have a floor made of parallel strips of wood, each the same width, and we drop a needle onto the floor. What is the probability that the needle will lie across a line between two strips?”.

A plane consisting of parallel strips of
wood, each separated by a distance d. A needle of
length l <= d is thrown and one wants to calculate
the probability that it crosses a line.

Although it was not the original motivation, the solution for the problem can be used to design a Monte Carlo method [5] for approximating the number π. The detailed maths behind it are not the scope of this paper, but, basically, the more needles you throw, the closest you will get to the real value of π. This, thus, is an interesting computational problem. Without further ado, let’s delve into how the experiment was conducted.

Methodology

All data was collected over a period of three consecutive days, taking the average from 10 measurements and always using the computer exclusively for this project. The computer technical specifications are: Dell Inspiron 15–3567, Intel Quad-Core i5–7200U, 2.50GHz 8GB RAM, Ubuntu 18.04. For Fortran the gfortran [6] compiler was used. All codes used can be found in a Github repository [7] — link at the end of the article.

Simulation

The algorithm for calculating π using a Monte Carlo simulation for Buffon’s Needle is pretty straightforward. There are only two relevant formulae:

Where n denotes if the needle crossed (1) or not (0) the line, x is the distance between the center of the needle and the closest strip, and θ is the angle formed by a line parallel to the strip passing through the needle’s center.

Scheme depicting a needle of length l < d
and the distances d, x, and the angle θ.

If you throw N needles and count the n number of them that cross a line, you can estimate π by this formula:

Our Monte Carlo simulation algorithm, then, consists of these steps:
• Set needle length (I did l = d = 1);
• Set the number N of throws (I chose 30 values ranging from 10 needles to 2 billion) and set a counter for the number n of needles that cross the line;
• Loop through N , generating random θ and x and checking condition given by (1). Update the counter if needle crossed the line;
• Calculate π (2) and errors.

Figure 3: Fortran subroutine used to implement our
Monte Carlo simulation for Buffon’s Needle prob-
lem. The same subroutine was coded in Python, R
and Julia.

Classification

For the other part of the experiment I trained Random Forest and Naive Bayes algorithms in Python, R and Julia, measuring execution time both for training and prediction. I analyzed how these times varied changing dataset size and number of features. Initially, I took 10 datasets from the UCI repository [8] with distinct size-feature combinations, but I realized that feature type was also important (i.e., a dataset with 400 binary features is different from a dataset with 150 continuous features with ranges from 10000 to 5000000). To have total control over the input and to make sure that a “fair” comparison was being made, I decided to build random datasets with random features. This is how I did it:
• Half of the features are binary (0 or 1), each with random weights (e.g., one can have 30% of 1’s, other 75%, etc.);
• The other half is continuous and normally distributed, with μ ranging from 0 to 1000 and σ² ranging from 10 to 500;
• The label is binary with 90% being on the negative (0) class.

Example of 4 binary features from one of
the datasets, each with a distinct 1/0 ratio.

Figure 5: Example of 4 continuous, normally dis-
tributed features from one of the datasets, each with
a distinct mean and standard deviation.

Using this approach, I built 42 distinct datasets, varying the number of features (by powers of 2 from 4 to 128) and dataset size (2k, 5k, 10k, 25k, 50k, 100k and 500k). I used popular and renowned libraries/packages for training: scikit-learn [9] (Python), RandomForest [10] and e1071 [11] (R), DecisionTree.jl [12] and NaiveBayes.jl [13] (Julia).

Results

Simulation

First of all, just to prove that the method works, here are the estimated π values as a function of needles thrown for each language:

Estimated π value as a function of nee-
dles thrown for each language. Initially the value
varies between languages because of differentiated
seeding, but somewhere around 10000 throws the
estimate is stable and really close to π.

Execution time as a function of needles
thrown for each language (log-log scale).

We can see clearly that Julia vastly outperforms the other languages (even faster that Fortran!), with R being the slowest. Interesting to note as well is
how Julia is really slow in the first iteration and then has a sudden dropoff before its execution time starts to have linear behavior (my guess is that it
has something to do with its typing system and JIT compilation [14] , like it at first tries to figure out what things are, and after that everything is optimized). Fortran (red line) appears constant until 10000 throws, but this is just due to the fact that the function used to measure time has a latency of 10e-3 seconds.

The same plot can be summarized in a barchart with some of the datapoints:

Figure 8: Barchart comparing execution times for
some of the dataset sizes, highlighting Julia as the
most performatic language.

For the largest number of throws (2 billion needles), Julia executed in about 33 seconds, more than twice as fast as Fortran (71 seconds), 19 times faster
than Python (10.5 minutes) and 580 (!) times faster than R (over 5 hours).

Classification

Let’s start with Random Forest. Plotting training execution time as a function of dataset size with number of features as color, we can have an overview of what is happening:

Random Forest training execution time as
a function of dataset size with number of features as
color, for each language (log-log scale). Axes limits
are the same for every plot and number of features
increase from yellow to red.

Since the plots are on the same scale, we can see that Python and Julia have very similar performance, whereas R, once again, tends to be slower especially for larger datasets, and its execution time is more influenced by the number of features (note R’s higher vertical dispersion).

Random Forest prediction execution
time as a function of dataset size with number of
features as color, for each language (log-log scale).
Axes limits are the same for every plot and number
of features increase from yellow to red.

As for prediction, here is a notable difference between Julia and Python. Julia is very fast to predict few (< 100) samples, but it scales poorly and eventually gets slower than Python for a lot of predictions. Python, on the other hand, is much slower for few predictions, keeps execution time nearly constant until 100 predictions and then scales better than Julia. For a large number of predictions, the three languages have very similar execution times (R once again with a higher variability).

We can double check this result by making independent plots for execution time vs. number of predictions and number of features:

Random Forest prediction execution time
as a function of number of predictions for each
language (log-log scale).

Random Forest prediction execution time
as a function of number of features for each
language (log-log scale).

We can clearly see in the first plot how Julia is really fast for few predictions, but then scales poorly and, for the last datapoints, all three languages have
similar execution times. In the second, on the other hand, we can see that R is initially faster, but scales poorly and becomes the slowest language around 32
features. As strange as it seems, Julia’s execution time actually decreases (albeit just a little) with an increase in number of features. This might be another characteristic of the language, or (my guess) it is actually close to being constant — since I took and average of 10 tries, this fluctuation might be
driving the curve down, when it is actually close to constant. Nevertheless, Julia’s prediction execution time is barely affected by the number of features.

Doing the same thing for Naive Bayes:

Naive Bayes training execution time as
a function of dataset size with number of features as
color, for each language (log-log scale). Axes limits
are the same for every plot and number of features
increase from yellow to red.

Naive Bayes prediction execution time as
a function of dataset size with number of features as
color, for each language (log-log scale). Axes limits
are the same for every plot and number of features
increase from yellow to red.

If Python and Julia had similar performance for Random Forest training, this time the former had a distinct advantage; in fact, Julia compares much more to R than Python for this algorithm, being the slowest of the three languages. My guess is that this library is not as well optimized for Julia’s types as RF, which, at least to me, is disappointing. At least Julia once again is very fast for few predictions for this algorithm as well, but scales even worst than RF. R, in turn, is much slower than the other two to predict — up to hundreds of times slower for a large number of predictions.

Conclusion

I conducted a twofold experiment to compare execution time for three widely used languages in the Data Science medium — Python, R and Julia. For the first part, a Monte Carlo simulation of Buffon’s Needle problem, Julia was astoundingly faster than the other languages (even Fortran — compiled with gfortran), up to 19 times faster than Python and 580 times faster than R (slowest, took about 5 hours to calculate π for 2 billion needle throws, as opposed to 33 seconds by Julia).

For Classification tasks, however, the scenario shifts to Python’s advantage. For Random Forest training, Python and Julia have similar performance, while R is significantly slower. For predicting, Julia is up to hundreds of times faster for few (< 100) predictions, but scales poorly and is eventually surpassed by Python — for a large number of predictions, the three languages have similar performance. Finally, for Naive Bayes, Python performed better than the other two, with Julia being the slowest — although, once again, Julia was faster for few predictions and scaled poorly. For the Random Forest algorithm Julia’s execution time was barely affected by increasing the number of features, while R, on the other hand, was the language in which this effect was most pronounced (for both algorithms).

All in all, Julia did really good at a purely numerical problem, even outperforming Fortran, but the trend did not repeat on Classification tasks, where Python did best. For both contexts R was, generally, much slower. So, at least for now, it seems that Julia is more “ready” for scientific computing rather than Data Science, where Python is still the top player; given the Buffon’s Needle (and even Random Forest) result, however, it is not impossible to foresee a near future in which Julia (or other language), aided by improvements in its libraries and community, can be a serious competitor in the DS medium.

References

[1] Eric W Weisstein. Buffon’s needle problem. 2003.

[2] Leo Breiman. Random forests. Machine Learning, 45(1):5–32, Oct 2001.

[3] Kevin P Murphy et al. Naive bayes classifiers. University of British Columbia, 18:60, 2006.

[4] Sarang Narkhede. Understanding auc-roc curve. Towards Data Science, 26, 2018.

[5] K Binder. Applications of monte carlo methods to statistical physics. Reports on Progress in Physics, 60(5):487–559, may 1997.

[6] A GNU Fortran Maintainer. Gfortran: Compiling a 1,000,000+ line numerical weather forecasting system. In GCC Developers’ Summit, page 159. Citeseer, 2005.

[7] L.F.R Moda. LanguageComparison — codes used to compare python, r and julia execution times in a numerical simulation (buffon’s needle) and classification tasks (random forest and naive bayes). https://github.com /lukmoda/LanguageComparison, 2019.

[8] Dheeru Dua and Casey Graff. UCI machine learning repository, 2017.

[9] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.

[10] Andy Liaw and Matthew Wiener. Classification and regression by randomforest. R News, 2(3):18–22, 2002.

[11] David Meyer, Evgenia Dimitriadou, Kurt Hornik, Andreas Weingessel, and Friedrich Leisch. e1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien, 2019. R package version 1.7–2.

[12] B. Sadeghi.Decisiontree.jl — julia implementation of decision tree (cart) and ran dom forest algorithms. https://github.com/bensadeghi/DecisionTree.jl, 2019.

[13] A. Zhabinski. Naivebayes.jl — naive bayesclassifier.
https://github.com/dfdx/NaiveBayes.jl, 2019.

[14] Miles Lubin and Iain Dunning. Computing in operations research using julia. INFORMS Journal on Computing, 27(2):238–248, 2015.

Comparing Python, R and Julia execution time in two contexts:Numerical Simulation and Classification Task