Strength & Speed: Leveraging Java into RAD with JRuby

6-Jan-2018

This rant was drafted in Aug 2013 but never published. I completely forgot about it until multiple days of frigid weather in NYC drove me to do a comprehensive review of moschetti.org and I discovered this in a TBD.tar file. I believe Ruby has become a bit less shiny in the past 4.5 years but this is nonetheless a good example of software factoring and reuse of core critical components.

Java: A Good Foundation

We're not here to debate if Java is "better" than C++ or Python or FORTRAN or COBOL or Smalltalk. Simply put, Java is a good language for building both reusable components and applications. Setting aside exotic reflection and bytecode manipulation for the moment, Java offers these features:

More than acceptable performance
Great compile-time checking of code and other benefits of a static type system
A very (arguably over) complete platform
Strong open source community support and vendor support
Broad variety of third party products offer integration via Java APIs
Large and economically fluid global talent pool

Where Java Gets ... Tedious

However, doing quick work with Java such as small apps and utilities can be tedious, especially when dealing with data and file integration activities. The capabilities and benefits outlined above become much less critical at the "edge" of the software stack and in some cases actually become hindrances. In general:

There is far less of a need for a strong, well-engineered interface (GUI, command line, etc.) that can service many needs. In fact, arguably the interfaces to these kinds of programs can be made as specific and narrow as desired to simplify and target use of the program. If you need to do something else, build a different app. Of course, this approach is successful only if a well-factored software stack is in use; otherwise, it is likely that you will be copying and modifying large chunks of underlying code instead of picking and choosing different component ingredients.
Performance and security do not have to be "overengineered" to satisfy the most demanding consumer. The program, as a runtime, has a defined performance and security profile and in many cases does not need to run as fast as theoretically possible. Tradeoffs between performance, memory use, storage, and compactness and/or ease of computing can be made at this level of the stack. The underlying components, however, clearly need to be engineered to be as fast and secure as possible because they become the limiting factors for any consuming program.
Apps and utilities tend to deal with externalized data as an important part of their function. Files, data streams, command line arguments, even things typed into screens. The predominant types we find in this space are strings and collections of strings and although Java is certainly capable of dealing with them, other languages and environments often make it far easier to work with these two types.

In short, sometimes you just want write 10-50 lines of code quickly to get something done.
The solution is clear: Develop a multi-language software base with Java at the core and a scripting language that can access the Java code functionality. This will permit you to enjoy the best of both worlds.

Enter Ruby

The Ruby language is currently enjoying a burst of popularity largely generated by the Ruby on Rails framework, but it is nonetheless a capable language at a basic level. Like Perl and Python, Ruby has relaxed type declaration, outstanding string manipulation functions, and somewhat more powerful collections operations that Java, and offers functional programming for those programmers (and programs) that well-benefit from this programming style. It also has rich ecosystem of open source modules called "gems" that satisfy many common programming needs.

As of this writing there are several Ruby implementations including JRuby, a 100% pure Java implementation of Ruby. It has been well-engineered to cooperate with the JRE both in terms of its ability to be embedded in a Java program (i.e. an existing Java program constructs some Ruby source code and calls an eval method) and import existing Java libraries into a Ruby program. It is the latter case that is the focus of this article.

Why Ruby?

No flames, please; this is not about why one language is better than another in an absolute sense
The implusive response is "why not?" The abstract academic response is "it actually doesn't matter; it's the multi-language leverage concept that is important." But from a practical standpoint, scripting is going to be done in Perl, Python, Groovy, Ruby, or more recently, Scala.

As much as I am a long-time fan of Perl and a Perl user, the more modern languages have a more refined and symmetric approach to objects. Plus, Perl integration with Java is always untidy; I much prefer integrating it to C or C++.
Groovy is more like dynamic Java and although that is good and it has many features of Python and Ruby, the syntax and collections and i/o handling are not quite as "easy/compact" as Python or Ruby.
Scala is very promising especially because it compiles to Java byte code but it is a little new to the party. Watch for an article on Scala leverage in the future...

This leaves Python and Ruby. Python was out of the gate first with JPython but the community quickly and wisely refocused efforts on a 100% pure Java implementation of Python, yielding (get it?) Jython. The truth is, for most of the RAD use cases encountered, both Jython and JRuby are perfectly acceptable. I chose to use Ruby and JRuby for these examples for these reasons:

There has been a lot of activity in the Ruby space of late. Yes, Rails is a big part of that.
As a Perl fan, there are many syntax and function similarities to Perl that make me feel more at home with Ruby.

The Meat

To begin, assume we have these Java classes:

Persistor, an interface to a persistence framework.
DBImpl, a persistence engine binding that implements Persistor
DAL, a data access layer that provides functional access to data. It consumes Persistor and basically hides SQL or noSQL or any other oddments from the applications.
Last but but by no means least:
FancyMath, a nontrivial object that depends on several other classes, has real state, an externalized form different from the internal representation, some beefy methods, etc. The methods have real algorithms and complex implementations and are our own work product, not open source. It has its own set of test drivers (functional and performance). In short, not a glorified HashMap with bespoke get/set of Strings. This is a core component and something you would not want to reimplement in another language.

Each of these classes is built into a different .jar file of course because they have different physical and logical dependencies. There is no reason FancyMath should depend on a specific persistor and certainly we don't want the persistence layer dependent on FancyMath. To simplify the example, we will name the archives persistor.jar, dbimpl.jar, DAL.jar, and fancymath.jar, We'll see why a real-life multi-jar scenario is important to consider later on.

Any number of Java programs can be written with these jars and these programs will benefit from compile-time checking and static typing; nothing particularly special here. But let's look at the following use case:

CSV content will be fetched via http from a web service
A local file be used as a category code mapping file
Certain functions in FancyMath will be called
The result will be written to the database

The "800 lb gorilla" in this setup is FancyMath. Everything else is easy and in the case of some languages, very easy. But we need to leverage the work expended on creating and maintaining fancymath.jar. In this case, more time is spent in getting the right import statements, properly allocating arrays, making HashMaps, and finding 3rd party/open source libs than actually doing the work. This Java program might look like the following. Notes:

In the spirit of apples to apples, I am using as few non-platform libs as possible (i.e. not using the apache commons IOUtils module)
The program is lacking in exception blocks, checks for null, closing i/o resoruces, etc. but those would be roughly equivalent in both Java and Ruby. We do not show them here to make the comparison a little clearer.
Restraint has been applied to trying to compactify the source. The goal here is to create a program quickly but with an eye toward downstream maintenance (or at least comprehension).
The example is conceptual and might not actually compile in a cut-and- paste scenario.

    import com.me.Persistor;
    import com.me.PersistorFactory;
    import com.me.SomePersistorFactoryImpl;

    import com.me.DAL;

    import com.me.FancyMath;

    import java.util.Scanner;
    import java.util.Map;
    import java.util.HashMap;
    import java.io.BufferedReader;
    import java.io.InputStreamReader;
    import java.net.URL;
    import java.net.URLConnection;

    public class Loader1 {
        private String getURL(String url) {

          URL website = new URL(url);
          URLConnection connection = website.openConnection();
          BufferedReader in = new BufferedReader(
                                new InputStreamReader(
                                    connection.getInputStream()));

          StringBuilder response = new StringBuilder();
          String inputLine;

          while ((inputLine = in.readLine()) != null) {
            response.append(inputLine);
          }
          in.close();

          return response.toString();
        }

        private Map bulkContentToMap(String content, String fldDelim) {
          Map tbl = new HashMap();
          String[] lines = content.split("\n");
          for(String l : lines) {
            String[] flds = l.split(fldDelim);
            tbl.put(flds[0], flds);
          }
          return tbl;
        }

        public static void main(String[] args) {
          String s1;

          s1 = getURL("http://machine/path");
          Map tbl1 = bulkContentToMap(s1, ",");

          // This is arguably slightly too "loose" but let's permit it for now...
          s1 = new Scanner(new File("path/to/codemap.csv")).useDelimiter("\\Z").next(); 
          Map tbl2 = bulkContentToMap(s1, ',');

          Persistor p = some PersistorFactory arrangement with dbimpl;

          Map m = new HashMap();

          for( Map.Entry<String,Object> me : tbl1.entrySet()) {
            m.clear();  

            m.put("key", ((String[])tbl2.get(me.getKey()))[1];  // yikes

	    String[] data = me.getValue();

            m.put("val1", data[1]);

            { // Turn "John A. Smith" into "JAS":
              StringBuilder sb2 = new StringBuilder();
              for(String p : data[2].split(" ")) {
                sb2.append(Character.toUpperCase(p.charAt(0)));
              }
              m.put("user", sb2.toString());
            }

            m.put("smoothed", FancyMath.smooth(data));

            DAL.insertCurve(p, m);
        }
    }

And here is how we might run it:


    $ java -classpath persistor.jar:dbimpl.jar:DAL.jar:fancymath.jar Loader1.class

In contrast, this is what the Ruby version looks like:

    include Java    # tell JRuby to activate Java class loader machinery
    
    import com.me.SomePersistorFactoryImpl;
    import com.me.DAL;
    import com.me.FancyMath;  # The whole reason we're doing this...

    require 'net/http'
    
    def bulkContentToMap(content, fldDelim)
      tbl = {}
      content.split("\n").each { |line|
        flds = line.split(fldDelim)
        tbl[flds[0]] = flds  # tbl[key] point to entire record
      }
      tbl
    end
    
    uri = URI('http://machine/path')
    c = Net::HTTP.get(uri)
    tbl1 = bulkContentToMap(c, ',')
    
    c = IO.read('path/to/codemap.csv')
    tbl2 = bulkContentToMap(c, ',')

    p = some PersistorFactory arrangement with dbimpl;
    
    tbl1.each_pair { |key,data|
      m = {}
      m["key"] = tbl2[key][1]
      m["val1"] = data[1]

      #  Turn "John A. Smith" into "JAS":
      m["user"] = data[2].split(" ").map {|w| w[0].chr }.join.upcase

      m["smoothed"] = FancyMath.smooth(data)
    
      DAL.insertCurve(p, m)
    }

And here is how we might run it:


    $ env CLASSPATH="persistor.jar:dbimpl.jar:DAL.jar:fancymath.jar" jruby loader1.rb

What are some interesting things we see here?

Ruby lvals need no explicit type declaration. They are what the result of the rval expression returns. This means that intermediate values in a series of function calls do not need a bevy of imports or other mechanisms for type declarations. This makes program construction both faster and for relatively small utils, clearer because attention is not drawn away from the really important and useful parts of the program.
Map and list handling is just easier. When dealing with string-keyed maps and lists of data, Ruby (and Python and Perl and ...) is just simpler than Java.
There are a host of functional and collection processing idioms in Ruby that are not exactly immediately obvious in their purpose to the novice but they appear so often that one becomes acclimatead to their use and output and they are powerful and compact. See the expression for assigning m["user"] above and compare to Java.
Ruby is used to powerfully deal with cracking and assembling data for passing to FancyMath.smooth() and DAL.insertCurve(). The complex and potentially high-performance aspects of that software is "safely" contained in Java and none of it is required to be re-engineered in Ruby including persisting to the database. Constructing a strong data access layer (DAL) over persistence is a vital factoring and insulation exercise even in a single language world. The effort to do so is repaid many times over in a multi-language leverage scenario.
The setup of the runtime CLASSPATH is the same.
There is a subtle issue of ensuring that the appropriate types (or toString() equivalents) can be created in Ruby to be properly passed to the Java layer. A Map containing a bespoke Ruby native object (i.e. some class we might create locally in the Ruby source) cannot be interpreted by the Java layer. This also means that common Java types like java.util.Date which appear in Java method signatures cannot consume the Ruby "natural equivalents"; a util or a "string representation bridge" must be used to create the Java type from the Ruby type.
Least important but still relevant: the Ruby program is about half the length of the Java program.

Like this? Dislike this? Let me know