JavaScript and the JVM

   Front End Development

Introduction

When it comes to server-side JavaScript programming, there are other choices besides v8 based solutions like NodeJS, TeaJS, SilkJS, and others. For the longest time, the Rhino JavaScript engine has been around for the JVM, and recently Java 8 was released with a brand new and improved JavaScript engine for the JVM called Nashorn. There is another project called DynJS that shows a lot of promise as well. In this post, I will investigate the benefits of JavaScript running on the JVM and demonstrate how easy it is to integrate with, or script, Java from JavaScript.

JavaScript in the JVM

A few years back, I read a blog post by a fellow named Steve Yegge, which talked about JavaScript on the JVM. The post is long, but well worth the read. At one point, he talks about the benefits of scripting on the JVM, and all of what he wrote and talked about back then is still valid today.

First, if there ever has been a computing problem, there is a solution for it in Java. Many times, the Java implementation of some library will be superior to what you might cobble together from other sources (see Apache Lucene). Why not leverage all this prior work? On top of the availability of all this code, in .jar format, it is portable between operating systems and CPUs – it almost runs everywhere.

Second, the JVM itself has a considerable number of man hours of research and development applied to it and it is ongoing. When they figure out how to make something smaller/faster/better for the JVM, it benefits everything that uses the JVM – including JavaScript execution and the libraries we’d call from JavaScript. We also get the benefit of Java’s excellent garbage collection schemes.

Third, the JVM features native threads. This means multiple JVM threads can be executing in the same JavaScript context concurrently. If v8 supported threads in this manner, nobody would be talking about event loops, starving them, asynchronous programming, nested callbacks, etc. Threads trivially allow your application to scale to use all the CPU cores in your system and to share data between the threads.

I’ll add a fourth, that you can compile your JavaScript programs into Java class files and distribute your code like you would any Java code.

So let’s have a look at JavaScript and the JVM.

Introducing Mozilla Rhino

Rhino is an open source JavaScript engine written in Java, and is readily available for most operating systems.

For OSX, I use HomeBrew to install it:

$ brew install rhino

For Ubuntu, the following command should work:

$ sudo apt-get install rhino

Once installed, we can run it from the command line and we get a REPL similar to what we’re used to with NodeJS:

$ rhino  
Rhino 1.7 release 4 2012 06 18  
js> var x = 10;  
js> print(x)  
10  
js>

You can run rhino from the command line passing it the name of a JavaScript to run:

$ cat test.js  
print('hello');

$ rhino test.js  
hello  
$

Rhino has a few built-in global functions, but I’ll only elaborate on a few. We’ve already seen that the print() function echoes strings to the console window.

The load() function loads and runs one or more JavaScript files. This is basically the server-side equivalent of the HTML <script> tag.

$ rhino  
Rhino 1.7 release 4 2012 06 18  
js> load('test.js')  
hello  
js>

The spawn(fn) function creates a new thread and runs the passed function (fn) in it.

js> spawn(function() { print('hello'); });  
Thread[Thread-1,5,main]  
js> hello

js>

Note the hello was printed on what looks like the command line. That was printed from the background thread and I had to hit return to see the next prompt. The Thread[Thread-1,5,main] was the return value of the spawn() method; it is a variable containing a Java Thread instance.

Spawning threads is that easy!

The JVM has first class synchronization built in. In Java, you use the synchronized keyword something like this:

//java  
public class bar {  
    private int n;  
    //...  
    public synchronized int foo() {  
        return this.n;  
    }  
}

This allows only one thread at a time to enter the foo() method. If a second thread attempts to call the function while a first has entered it (but not returned yet), the second thread will block until the first returns.

Rhino provides a sync(function [,obj]) method that allows us to implement synchronized functions. The equivalent JavaScript looks like:

//javascript  
function bar() {  
    this.n = ...;  
}  
bar.foo = sync(function() { return this.n; });

If we spawn() two threads that call bar.foo(), only one will be allowed to enter the function at a time.

Synchronization is vital for multithreaded applications to avoid race conditions where one thread might be modifying a variable/array/object while another thread is trying to examine it. The state of the variable/array/object is inconsistent until the modification is complete.

To recap so far, Rhino provides print(), load(), spawn(), and sync() functions, among others. In practice, I only see the load() and sync() methods being necessary because Rhino and other JVM JavaScript implementations allow us to “script Java” from JavaScript programs.

Scripting Java

Rhino makes scripting Java rather easy. It exposes a global variable Packages that is a namespace for every Java package, class, interface, etc., on the CLASSPATH.

The Java 7 API JavaDocs for the java.lang.System class can be found here:
http://docs.oracle.com/javase/7/docs/api/java/lang/System.html

On that page is the definition of the field, “out” and an example that reads something like:

public static final PrintStream out

The “standard” output stream. This stream is already open and ready to accept output data. Typically this stream corresponds to display output or another output destination specified by the host environment or user.

For simple stand-alone Java applications, a typical way to write a line of output data is:

System.out.println(data)

From rhino, we can access System.out.println():

js> Packages.java.lang.System.out.println  
function println() {/*  
    void println(long)  
    void println(int)  
    void println(char)  
    void println(boolean)  
    void println(java.lang.Object)  
    void println(java.lang.String)  
    void println(float)  
    void println(double)  
    void println(char[])  
    void println()  
*/}

js>

What this is showing is that there are a number of implementations of println() in Java with different signatures. Rhino is smart enough to choose the right implementation based upon how we call it. Also note that the types in the println() signatures are Java native types.

For example:

js> Packages.java.lang.System.out.println('hello')  
hello  
js>

Rhino also exposes a global java variable which is identical to Packages.java – this is a handy way to access the builtin Java classes.

A minimal console class

We can now use load() to load a primitive JavaScript console implementation:

$ cat console.js  
console = {  
    log: function(s) {  
        java.lang.System.out.println(s);  
    }  
};

$ rhino  
Rhino 1.7 release 4 2012 06 18  
js> load('console.js')  
js> console.log('hello')  
hello  
js>

Java types in JavaScript

When writing JavaScript, things work as expected. An object is an object, an array is an array, a string is a string, and so on. But when we script Java from JavaScript, our variables often are instances of Java objects. A trivial example:

js> var a = new java.lang.String('a');  
js> a  
a  
js> // seems like a javascript string  
js> typeof a  
object  
js> // but it's an object  
js> typeof 'a'  
string  
js> // javascript strings are typeof string  
js> var b = 'b';  
js> a.getBytes()  
[B@4f124609  
js> b.getBytes()  
js: uncaught JavaScript runtime exception: TypeError: Cannot find function getBytes in object b.  
js> var c = String(a)  
js> c.getBytes()  
js: uncaught JavaScript runtime exception: TypeError: Cannot find function getBytes in object a.

Note that getBytes() is a method you can call on Java strings, but not on JavaScript strings. Also note that we can cast Java strings to JavaScript strings.

Fortunately, we rarely have to instantiate Java strings, but we will have to deal with binary data when scripting Java. JavaScript has no real native binary type, but we can have our variables refer to instances of Java binary types.

Java Byte Arrays

One thing we’re certainly going to do is deal with Java byte arrays. We can instantiate one (1024 bytes) like this:

js> var buf = java.lang.reflect.Array.newInstance(java.lang.Byte.TYPE, 1024);  
js> buf  
[B@44d4ba66  
js> buf[0]  

js> buf[1]  

js> buf[1] = 10;  
10  
js> buf[0]  

js> buf[1]  
10

Useful example

Let’s look at how to read in a text file by scripting Java, and it does look a lot like Java. All the Java classes we use are in the package java.io and you can read up on FileInputStream, BufferedInputStream, and ByteArrayOutputStream. There are certainly many examples of their use (in Java) on the web.

$ cat cat.js  
var FileInputStream = java.io.FileInputStream,  
BufferedInputStream = java.io.BufferedInputStream,  
ByteArrayOutputStream = java.io.ByteArrayOutputStream;

function cat(filename) {  
    var buf = java.lang.reflect.Array.newInstance(java.lang.Byte.TYPE, 1024),  
        contents = new ByteArrayOutputStream(),  
        input = new BufferedInputStream(new FileInputStream(filename)),  
        count;

    while ((count = input.read(buf)) > -1) {  
        contents.write(buf, 0, count);  
    }  
    input.close();  
    return String(contents.toString());  
}

$ rhino  
Rhino 1.7 release 4 2012 06 18  
js> load('cat.js')  
js> cat('console.js')  
console = {  
    log: function(s) {  
        java.lang.System.out.println(s);  
    }  
};

js> var s = cat('console.js')  
js> s.length  
74  
js>

Maybe this is a bit ugly, but we can encapsulate all the bridging between JavaScript and Java in nice JavaScript classes. Then we only need to call our JavaScript from JavaScript and not care so much about how Java is being called or the conversions between JavaScript native objects and Java ones is being done. One thing for sure is that this seems a lot cleaner and simpler than writing C++ modules to link with NodeJS or other V8 alternatives.

In other words, we only had to write the cat() function once. We can load() it in any or all of our applications from now on and not have to write the interface code to Java again.

Threads without spawn()

This example is a bit longer, but it demonstrates how to implement a Runnable interface in JavaScript.

$ cat threads.js  
load('console.js');

var Thread = java.lang.Thread;

var x = 0;

function thread1() {  
    console.log('thread1 alive');  
    while (1) {  
        Thread.sleep(10);  
        console.log('thread1 x = ' + x);  
        x++;  
    }  
}

function thread2() {  
    console.log('thread2 alive');  
    while (1) {  
        Thread.sleep(10);  
        console.log('thread2 x = ' + x);  
        x++;  
    }  
}

new Thread({ run: thread1 }).start();  
new Thread({ run: thread2 }).start();

When I run it, you can see from the output the effect of the race condition where both threads are incrementing the x variable:

$ rhino ./threads.js  
thread2 alive  
thread1 alive  
thread2 x = 0  
thread1 x = 0  
thread1 x = 2  
thread2 x = 2  
thread2 x = 4  
thread1 x = 4  
thread2 x = 6  
thread1 x = 6  
thread1 x = 8  
thread2 x = 8  
thread2 x = 10  
thread1 x = 10  
thread1 x = 12  
thread2 x = 12  
thread1 x = 14  
thread2 x = 15

This is why we need the sync() function.

I’ll implement proper synchronization and we’ll see the threads cooperate.

The improved version:

$ cat threads.js  
load('console.js');

var Thread = java.lang.Thread;

var x = 0;

var bumpX = sync(function() {  
    return x++;  
});

function thread1() {  
    console.log('thread1 alive');  
    while (1) {  
        Thread.sleep(10);  
        console.log('thread1 x = ' + bumpX());  
    }  
}

function thread2() {  
    console.log('thread2 alive');  
    while (1) {  
        Thread.sleep(10);  
        console.log('thread2 x = ' + bumpX());  
    }  
}

new Thread({ run: thread1 }).start();  
new Thread({ run: thread2 }).start();

Note when we run it, the value of x increments nicely and both threads always see the volatile value.

$ rhino ./threads.js  
thread1 alive  
thread2 alive  
thread1 x = 0  
thread2 x = 1  
thread1 x = 2  
thread2 x = 3  
thread1 x = 4  
...

This version works, but it is not quite perfect. You see, the bumpX() function returned by sync() synchronizes on the this object, which isn’t harmful in this example. However if we had another two threads bumping a y variable with a bumpY() method also synchronized on this, there’d be unnecessary contention among the 4 threads. When thread1() calls bumpX(), the remaining 3 threads will be blocked when they call bumpX() or bumpY().

The fix is:

javascript  
var bumpX = sync(function() {  
    return x++;  
}, x);

Note the extra argument to sync(), the object we want to synchronize on. Now the callers that call bumpX() will block appropriately, not affecting callers of bumpY().

About synchronization

I wouldn’t count on any JavaScript operation to be atomic. That is, array.pop() could in theory get interrupted by a thread switch interrupt, so if you have two threads manipulating that array, you have a seriously bad race condition. So be aware of thread safety. If you ever expect to have two threads access the same memory, synchronize around the accesses, as I demonstrated.

Extending Rhino (3rd party java)

We’re interested in calling 3rd party libraries, so here’s an example. I created a file, Example.java and compiled it into a .class file:

$ cat Example.java  
public class Example {  
    public static String foo() { return "foo from java"; }  
};

$ javac Example.java  
$ ls -l Example.*  
-rw-r--r-- 1 mschwartz staff 277 Apr 24 15:26 Example.class  
-rw-r--r-- 1 mschwartz staff 83 Apr 24 15:25 Example.java  
$

The rhino executable program is really a bash script that starts up the JVM (java command) with the rhino .jar file and passes any additional command line arguments to the rhino java program.

$ cat `which rhino`  
#!/bin/bash  
exec java -jar /usr/local/Cellar/rhino/1.7R4/libexec/js.jar "$@"

From this we can craft our own command lines, including some that add .jar files to the class path. To see a full description of the java command and all the command line options, enter this at your shell prompt:

$ man java

We cannot pass a CLASSPATH via -cp flags to the java command if we also specify -jar. So we are going to have to use a form of the java command that specifies CLASSPATH and the initial class/function to call. I dug into the rhino sources and found that the main function is org.mozilla.javascript.tools.shell.Main.

Here’s the command in action:

$ java -cp ".:/usr/local/Cellar/rhino/1.7R4/libexec/js.jar" org.mozilla.javascript.tools.shell.Main
Rhino 1.7 release 4 2012 06 18
js>

We can see it is running the REPL as if we ran the rhino shell script. Now we can see if our Example.foo() function is accessible from our JavaScript environment.

js> var x = Packages.Example.foo()  
js> x  
foo from java  
js> typeof x  
object  
js> typeof String(x)  
string  
js> String(x)  
foo from java  
js>

You should note that our x variable holds a reference to a Java String, not a JavaScript string. We can pretty much use it like a JavaScript string, and Rhino does the type conversions automagically as needed.

js> var y = x+10  
js> y  
foo from java10  
js> typeof y  
string  
js> typeof x  
object  
js>

A brief note about the Java CLASSPATH

We can trivially create our own shell scripts to launch rhino with our own CLASSPATH.

eIt seems intuitive to me that if a directory is part of your CLASSPATH that Java runtime should find .class files as well as .jar files in that directory. But it does not work that way! CLASSPATH may specify a directory where only .class files are considered or it may specify .jar files that basically act like a directory containing only .class files.

This means if you want to use classes in two separate .jar files, you have to include both .jar files in the CLASSPATH.

Introducting Nashorn

Nashorn is a completely new JavaScript engine that is officially part of the recently released Java 8.

In order to run it, I installed the Java 8 JDK on my Mac. I haven’t seen any ill effects yet, so I guess it is safe. There were some negative effects of installing Java 7 on a Mac, particularly that Java 7’s browser plugin is 64-bit only and Google Chrome is 32-bit only; you lose the ability to run Java from WWW sites in Chrome. I haven’t tested to see if this is true for Java 8, but I haven’t seen any similar warnings.

The installation process is not 100% right. There is a jjs program that we are supposed to be able to run to execute Nashorn scripts (jjs is roughly Nashorn’s version of the rhino command). After installing Java 8, jjs is not in /usr/bin as it should be. A little bit of digging turned up the file here:

/Library/Java/JavaVirtualMachines/jdk1.8.0_05.jdk/Contents/Home/bin

So I made a soft link to it in /usr/bin:

$ sudo ln -s /Library/Java/JavaVirtualMachines/jdk1.8.0_05.jdk/Contents/Home/bin/jjs /usr/bin/jjs

There is also a /usr/bin/jrunscript and a manual page for that dated 2006. The jrunscript program appears to launch Nashorn as well. There is also a jrunscript in the same directory as jjs that is different than the one in /usr/bin. A lot of confusion caused by all this, but I will use jjs for the rest of this article.

The jjs program presents a REPL just like rhino does:

$ jjs  
jjs> print('hello')  
hello  
jjs> x = 10  
10  
jjs> x  
10  
jjs>

There is quite a bit of useful information about the JavaScript environment provided by Nashorn here:

https://wiki.openjdk.java.net/display/Nashorn/Nashorn+extensions

It didn’t take me very long to figure out how to get the threads demo program working. Here’s the modified source:

$ cat threads.js  
load("nashorn:mozilla_compat.js");  
load('console.js');

var Thread = java.lang.Thread;

var x = 0;

var bumpX = sync(function() {  
    return x++;  
});

function thread1() {  
    console.log('thread1 alive');  
    while (1) {  
        Thread.sleep(10);  
        console.log('thread1 x = ' + bumpX());  
    }  
}

function thread2() {  
    console.log('thread2 alive');  
    while (1) {  
        Thread.sleep(10);  
        console.log('thread2 x = ' + bumpX());  
    }  
}

var t1 = new Thread(thread1);  
t1.start();  
new Thread(thread2).start();  
t1.join();

I had to load("nashorn:mozilla_compat.js") to provide the sync() function.

The new Thread calls no longer work with what looks like a Runnable interface, or an object like:

{  
    run: function() { ... }  
}

Instead, Nashorn can figure out that Runnable has only one member (run) and Runnable is required for Thread constructor, so it does the right thing if you pass the constructor a JavaScript function.

One other change I had to make was to call join() on one of the threads started. Without this, jjs exited right away. This is a different behavior from rhino.

Nashorn also features a scripting mode that adds some very non-standard features to the JavaScript language. The concept is a good one if you want to use Nashorn to write shell scripts. The only problem is anything you write using these extensions will not be portable to any other JavaScript environment. For this reason, I won’t go into more depth about this feature.

Nashorn Performance

I created 2 very simple and probably worthless programs to try to get a sense of how fast Nashorn is compared to Rhino (and NodeJS/v8).

The first program simply concatenates 1 million integers into a very long string:

javascript  
$ cat perf.js  
var s = '';  
for (var i=0; i<1000000; i++) {  
    s += ' ' + i;  
}

My trial runs follow.

rhino

  
$ time rhino perf.js  
rhino perf.js 5.03s user 0.63s system 129% cpu 4.378 total  
$ time rhino perf.js  
rhino perf.js 5.07s user 0.64s system 130% cpu 4.386 total  
$ time rhino perf.js  
rhino perf.js 5.06s user 0.63s system 129% cpu 4.377 total  
$

jjs

  
$ time jjs perf.js  
jjs perf.js 14.80s user 0.27s system 600% cpu 2.510 total  
$ time jjs perf.js  
jjs perf.js 20.19s user 0.31s system 636% cpu 3.221 total  
$ time jjs perf.js  
jjs perf.js 15.53s user 0.26s system 611% cpu 2.580 total  
$ time jjs perf.js  
jjs perf.js 19.05s user 0.28s system 637% cpu 3.032 total  
$ time jjs perf.js  
jjs perf.js 19.30s user 0.29s system 637% cpu 3.075 total

nodejs

$ time node perf.js  
node perf.js 0.29s user 0.05s system 100% cpu 0.341 total  
$ time node perf.js  
node perf.js 0.29s user 0.05s system 100% cpu 0.338 total  
$ time node perf.js  
node perf.js 0.29s user 0.05s system 100% cpu 0.338 total  
$ time node perf.js  
node perf.js 0.29s user 0.05s system 100% cpu 0.338 total

I happen to know that Rhino 1.7R4 is notoriously slow at string concatenation. It is much faster to join() an array. So I created a second trial program:

javascript
$ cat perf2.js
var a = [];

for (var i=0; i<1000000; i++) {
    a[i] = i;
}

var b = a.join('');

This one creates an array of a million integers and joins it together. The resulting string should be the same as for perf.js.

Here are the trial runs for perf2.js.

rhino


$ time rhino perf2.js  
rhino perf2.js 1.54s user 0.14s system 240% cpu 0.698 total  
$ time rhino perf2.js  
rhino perf2.js 1.53s user 0.14s system 241% cpu 0.689 total  
$ time rhino perf2.js  
rhino perf2.js 1.53s user 0.14s system 237% cpu 0.700 total  
$ time rhino perf2.js  
rhino perf2.js 1.53s user 0.13s system 237% cpu 0.701 total

jjs


$ time jjs perf2.js  
jjs perf2.js 7.28s user 0.19s system 438% cpu 1.704 total  
$ time jjs perf2.js  
jjs perf2.js 6.98s user 0.19s system 420% cpu 1.705 total  
$ time jjs perf2.js  
jjs perf2.js 7.89s user 0.18s system 448% cpu 1.800 total  
$ time jjs perf2.js  
jjs perf2.js 7.06s user 0.19s system 431% cpu 1.679 total

nodejs


$ time node perf2.js
node perf2.js 0.32s user 0.05s system 100% cpu 0.368 total
$ time node perf2.js
node perf2.js 0.33s user 0.05s system 100% cpu 0.376 total
$ time node perf2.js
node perf2.js 0.33s user 0.05s system 100% cpu 0.380 total
$ time node perf2.js
node perf2.js 0.33s user 0.05s system 100% cpu 0.380 total

Conclusion

Rhino is the gold standard of JavaScript for the JVM. It simply has been around for a very long time (since the 1990s) and it is feature rich and relatively bug free. Nashorn represents a new code base and new commitment by Oracle to JavaScript for the JVM. It’s brand new, and already appears to be a solid implementation in its own right. It’s only going to get better, too. Rhino is likely to run on any new release of Java for a long time to come, but it’s not as likely to get the attention to improvements as Nashorn.

The question is when is it time to ditch Rhino in favor of Nashorn? My guess is soon if Java 8 gains the adoption that I expect.


Like What You See?

Got any questions?