Tricks for Better Software: 2011

Wednesday, December 28, 2011

Sonar Review

Overview

In short, Sonar provides great convenience in source code analysis and unit testing. Sonar works as a facade to various open source tools aiming at code qualities, such as Findbugs, PMD, Checkstyle, Cobertura, Clover, Surefire, etc. The Sonar-Maven integration is excellent. Executing a single Maven goal sonar will run all code analysis and tests, and will store the results in a database.

A dashboard displays summary of various analysis. The summary can be either per project or per Java package. From there one can drill down into various issues with details like where in the source code the issues occur. Often the code with issues is visually marked out in its context displayed on the GUI. For each rule violation, Sonar also provides a brief explanation on why it is an issue. For programmers who do not have any idea about the potential harm of the code with the issue, the explanation is a good starting point to learn.

Features

Rule Violations

This is one of the most useful features of Sonar. It lists the total number of rule violations, and the subtotal numbers in the groups: blocker, critical, major, minor, and info (they are in the descendant order of severity). It also gives the percentage of rule compliance. Below are two bugs caught by Sonar in real world enterprise projects that I saw in person.

Bug 1 – An Infinite Loop

public Table() {
Table myTable = new Table();
}

Sonar's remark: Correctness - An apparent infinite recursive loop.

The constructor calls itself recursively and forms an infinite loop.

Bug 2 – Mistaking & for &&

if ( books != null & !books.isEmpty() ) {
return books.toArray(new Book[]{});
}

Sonar's remark: Correctness - Possible null pointer dereference

The code author intended to prevent NullpointerException but the actually code does not do what he meant. & is actually bit shift rather than logic “and”. The author should have used &&, which is logic “and” instead of &.

Duplications

Sonar is quite good at identifying code duplications. Not just identical code blocks are identified as duplications, but also code blocks similar enough.

The duplication check is very helpful for programmers to minimize code duplications. For an excellent explanation of the harm of code duplications, please see The Evils of Duplication in the must-read The Pragmatic Programmer: From Journeyman to Master by Andrew Hunt and David Thomas.

Complexity

The complexity is measured by the cyclomatic complexity number for the classes and methods. Cyclomatic complexity is a mature and well-accept software quality metric, backed by empirical data. This feature is very helpful to identify hotspots where the code is too complex.

Package tangle index

It is said that there is a package dependency from Package A to Package B if some Java classes in Package A depend on Java classes in Package B. The package tangle index indicates the severity of cyclical package dependency, for example Package A depends on Package B, and meanwhile Package B also depends on Package A.

The layer architecture pattern is the most fundamental architecture pattern. (For an excellent discussion of the layer architecture pattern, see Layer in Pattern-Oriented Software Architecture Volume 1 by Frank Bushchmann and others). Classes responsible for different layers should be placed in different packages. A cyclical package dependency may indicate that a lower layer depends on a upper layer, or even worse, the software does not have well defined layers. We may call it anti-layer pattern.

Some cyclical package dependencies can be eliminated by simply moving classes from one package to another.

Comments

In the comments section, sonar shows the percentage of API with javadoc and how many public classes or public methods do not have any companion javadoc. The GUI allows user to drill down into the source code to see which classes/methods are not documented.

Sonar, however, cannot evaluate the quality of the javadoc. It means that one may have 100% API documented but the document may be totally trash.

On the other hand, Sonar cannot tell that certain methods like getters and setters do not really need javadoc.
The analysis score of comments must be taken with a grain of salt.

Code coverage

Sonar provides a summary on the percentage of code and branches covered by unit tests. The GUI also makes it easy to locate code that is not covered by unit tests. However, one must be aware that a 100% code coverage is far from a 100% correctness. 100% coverage only means that 100% code is executed during unit tests. Bugs won't be caught by just executing the code where they reside.

Lack of cohesion of methods (LCOM4) and Response for Class (RFC)

These are two of the six metrics in the Chidamber & Kemerer metrics suite. (LCOM4 is an improved versio. The the original LCOM of the Chidamber & Kemerer metrics suite is usually referred as LCOM1).

Personally, I have not yet seen much value of these two metrics.

Issues

Performance

Sonar is quite slow. Some time it is very slow.

Data Corruption

A bug frequently corrupts data in database and causes sonar execution (analysis and testing) to fail.

Example - Error Message from Failing Sonar Execution (via Maven)

[INFO] ------------------------------------------------------------------------
[ERROR] BUILD ERROR
[INFO] ------------------------------------------------------------------------
[INFO] Can not execute Sonar

Embedded error: PicoLifecycleException: method 'public void org.sonar.batch.ProjectTree.start() throws java.io.IOException', instance 'org.sonar.batch.ProjectTr
ee@a9be37, java.lang.RuntimeException: wrapper
result returns more than one elements
[INFO] ------------------------------------------------------------------------
[INFO] For more information, run Maven with the -e switch
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 1 minute 17 seconds
[INFO] Finished at: Fri Sep 16 10:43:21 EDT 2011
[INFO] Final Memory: 103M/119M
[INFO] ------------------------------------------------------------------------

Sonar expects any two records in the snapshots table with 1 as the value of the islast field to have different project_id. The bug causes more than one record to have the same project_id while the value of the islast field is 1. When it happens, Sonar execution (analysis and testing) will fail.

To resolve the problem, one has to run a certain SQL script in the database to delete the problematic snapshots records. It will only temporarily solve the problem, which will re-occur again.

Documentation

Sonar documentation is pretty good.

Sonar documentation at the home site

Tuesday, November 29, 2011

The Basics of Java Generics

Overview

For the sake of generics, Java types (classes and interfaces) can be grouped into three categories:

Ordinary type, e.g. String, Integer
Generic type, e.g. java.lang.Comparable<T>, java.util.List<E> , and java.util.ArrayList<E>
Parameterized type, e.g. java.lang.Comparable<Integer>, java.util.List<String>, and java.util.ArrayList<String>

In Java, there are four kinds of generic constructs:

generic interface
generic class
generic method
generic constructor

Constructors are very much like methods, except that there is not any return for constructors. For this reason, we are going to omit any discussion about generic constructors since all discussions about generic methods, except what about method returns, also apply to generic constructors.

Coding with generics usually involves one or more of the following:

Defining a generic interface, class, or method
Invoking a generic interface, or class
Invoking a generic method
Defining a non-generic method with at least a parameter of a generic type, or with the return of a generic type. Such a method must be a member of a generic type.
Defining a method with at least a parameter of a parameterized type, or with the return of a parameterized type
Invoking a method with at least a parameter of a parameterized type
Invoking a method with the return of a parameterized type

Defining Generic Types

Example 1 – Defining a generic interface

public interface Iterable<T>

The T in Iterable<T> is called a type parameter. In the language of Java generics, we say that the generic type Iterable takes a type parameter, T. Conventionally, a single upper case T is used as identifier for a type parameter (T stands for type).

Example 2 – Defining a generic interface that extends another generic interface

public interface List<E> extends Collection<E>

Here the E in List<E> is the type parameter. Conventionally, E is used as identifier for type parameter of collections. (E stands for element)

Example 3 – Defining a generic class that implements a generic interface

public class ArrayList<E> implements List<E>

Defining Generic Method

Example 4 – Defining a generic method

<T> T[] toArray(T[] a);

Above is the definition of a toArray method in the body of java.util.List. The first <T> tells that this is a generic method and the method takes a type parameter T. This means that this method has a hole that will be filled later with a concrete type. Then it also tells that the type of the method return is T[], and the type of the method parameter is a T[] (array of T). Essentially, the type parameter of this method establishes a constrain, in term of type, between the method return and parameters. If we want to turn a list into a String [], we must pass to the toArray method a String[].

Please note that a method whose parameters or return is of a type parameter is not necessarily a generic method unless in its definition <T> is placed before its return type (or void). For example, the methods shown in Example 5 below are not generic methods.

On the other hand, even a non-generic type may have a generic method as its member.

Type Parameter

A type parameter is a placeholder for a concrete type. It is important to understand that a type parameter is either taken by a generic type, a generic method, or a generic constructor. On the other hand a generic type or method takes at least one type parameter.

Inside the body of a generic interface or class, a type parameter taken by the interface or class, can server as the type of parameters, the type of return, or the type of local variables, of an instance method. It can also server as the type of instance fields.

A type parameter taken by a generic method can server as type of its parameters, type of its return, or type of its local variables. (Note: It is legal for a generic method to be a static member of a class or interface)

A type parameter taken by a class or interface cannot be:

Type of its static fields (because there is only one class vs. many different T)
Anywhere in its static member methods (same reason)
In a static initial block (same reason)

In addition, none type parameter can be

used in new T() statement to create a new object (because erasure)
used in new T[size]() to create a new array of objects (because erasure)

Example 5 - Defining methods with parameters or return of type parameter

boolean add(E e);
E get(int index);

The above two methods are defined in the body of generic List<E>, the type parameter E servers as the type of parameter named e for the method named add, and the return type of the method named get. These two methods are not generic methods. The type parameter E is not taken by the methods but by their owner type (i.e. List<E>).

Generic Type v.s. Parameterized Type

It is critical to understand the difference and relationship between generic type and parameterized type. For example, ArrayList<E> is a generic type and ArrayList<String> is a parameterized type. They differ in the following aspects:

E in ArrayList<E> is a type parameter, and String in ArrayList<String> is an concrete type (particularly, a ordinary class). In regard to ArrayList<String>, the concrete type String servers as the type argument, to fill the place held by type parameter E, which is taken by ArrayList<E>.
A parameterized type, like an ordinary type, is a concrete type, while a generic type is an abstract type
It is legal to create an object of ArrayList<String> via statement new ArrayList<String>();, statement new ArrayList<E>(); is, however, illegal.
More generally, the usage of a parameterized type is exactly the same as an ordinary type. A parameterized type can be used at any place where an ordinary type is to be used, i.e. to be used as the type of a variable or a method return. The variable may be a method parameter, a local variable, or a field.

A parameterized type always has a special relationship with a generic type: a parameterized type is always instantiated out of a generic type. For example, ArrayList<String> is instantiated out of ArrayList<E>, by replacing a type argument, String, for the type parameter, E. In order for ArrayList<String> to exist, ArrayList<E> must exist first.

A type parameter is like a hole. When the hole is filled with a concrete type, a parameterized type comes out of the generic type. Replacing a type parameter by a concrete type is called invocation of a generic type. While one can invoke a method passing arguments, one can invoke a generic type passing type arguments.

In a parameterized type a type argument takes all places used to be held by its corresponding type parameter. For example, in List<String>, there are effectively

boolean add(String e);

String get(int index);

(For the formal specification, see 4.5.2 Members and Constructors of Parameterized Types, The Java Language Specification, Third Edition, Addision Wesley, 2004)

Bounded Type Parameter

In a generic type definition, a type parameter may be given an upper bound.

Example 6 – Defining a generic type named SortedSet which is a set with elements sorted

public interface SortedSet<E extends Comparable<E>> extends Set<E>

Here <E extends Comparable<E>> indicates that E is a bounded type parameter and Comparable<E> is the upper bound. Any parameterized type out of this generic type must have the type argument as a sub-type of Comparable<E>. For example, we may have a parameterized type SortedSet<Integer>. It is OK since Integer implements Comparable<Integer>. However, we cannot have a parameterized type SortedSet<java.io.File> because File does not implement Comparable<File>. In short, the bound of a type parameter is used to restrict the type arguments to the generic type. Without a bound, any type will be accepted as legal type argument at compile time. Some of them may lead to runtime exception.

A few more words about the example, the upper bound, java.lang.Comparable<E>, is also a generic type, and its type parameter is E, the same as of SortedSet.

If there are multiple such bounds, separate them by & in the generic type definition.

Calling a Generic Method

Example 7 - Calling a generic method

         List<String> list = new ArrayList<String>();

         list.add("One");

         list.add("Two");

         String[] stringArray = list.toArray(new String[]{});

         System.out.println(stringArray[0]);

         System.out.println(stringArray[1]);

Usually, it is not required to specifying the type argument (i.e. the concrete type to take the place of the type parameter) when call a generic method, as the example above shows, because the compiler can infer the type argument from the type of the argument to the method (i.e. String[] in the example). That is however, not always the case. In some cases, the compiler cannot determine the concrete type by inference. Then the type argument has to be explicitly specified. The right syntax to specify the type argument to a generic method is show in the example below:

String[] stringArray = list.<String>toArray(new String[]{});

The type argument (e.g. String in the example above) is place between < and >, and immediately before the name of the generic method being called.

By the way, be aware that the toArray method does not bring complete type-safety. The following code fragment compiles but causes run time exception.

        List<String> list = new ArrayList<String>();
        list.add("One");
        list.add("Two");

        Integer[] intArray = list.toArray(new Integer[]{});

Type Wildcard

Type wildcard in Java Generics is a complex topic. It is discussed in my other post Type Wildcard in Java Generics.

Sunday, November 27, 2011

StringTemplate 4 Note for Java Programmers

StringTemplate is a very simple and powerful template engine. The existing documents are fairly complete. As long as one knows where to find those documents, learning StringTemplate is quite easy. This note is intended to help users to quickly find needed documents.

To learn StringTemplate means to learn the following three aspects of it:

The core concepts and the relationship among them
Syntax of templates and template groups
API

StringTemplate 4 Document on the StringTemplate Home Site

In this document, users can find:

A brief introduction to StringTemplate 4
Instructions to setup Java programs to use StringTemplate 4
Syntax of templates and groups
StringTemplate 4 Java API

The syntax documentation is quite formal and is in (a variation of?) the Backus-Naur Form notation. Programmers who are not used to such formal notation may feel that the syntax documentation is hard to understand. For those programmers, my recommendation is to spend one hour to learn the Backus-Naur Form notation. Of course, examples also help.

Enforcing Strict Model-View Separation in Template Engines

This article from the creator of StringTemplate provides:

An overview of StringTemplate
The philosophy of StringTemplate
The theoretical foundation of StringTemplate
A few examples showing the main features of StringTemplate, namely:

attribute (and attribute property) reference
map operation (i.e. applying a template to an attribute that is a list of objects or applying a list of templates alternatively to an attribute that is a list of objects),
conditional include
recursive template.

Many template users say that certain other template engines are more powerful than StringTemplate. This article helps users to understand why those features existing in other template engines are purposely excluded from StringTemplate for very good reasons.

A Functional Language for Generating Structured Text

In this article by the creator of StringTemplate, the author provides a comprehensive introduction, explaining the major concepts in StringTemplate with examples:

Template
Template group
Expression
Attribute
Multi-valued attribute
Implicitly set attribute
Template include
Conditional include
Template application, to a single or multiple attributes
Anonymous inline template
Recursive template
Group inheritance and overriding
Template region
Group interface
Map (dictionary) and list
Renderer

This is the deepest document about core concepts in StringTemplate. Consider this article as the must-read in order to really mater StringTemplate. The syntax of template is up to date. However, the Java API has been changed since the publication of this article. Therefore the Java code examples in this article are out of date.

Sunday, November 20, 2011

Classpath and Resource Files in Java Programs

(Last updated on February 2, 2013)

Frequently a Java program needs to read some resource files in the file system. Such a resource file may be a .properties file or a .xml file for program configuration. Often it is not practical to hardcode the full path to such a file in the program because if we do so, we will not be able to execute the program correctly except from a specific location in the file system. That is highly undesirable.

A popular practice is to place such a resource file on a classpath and code the program to search the classpath for the resource file. A programmer who adopts this practice must understand well what a classpath is and how to discover it programmatically.

To understand classpath, one must at first understand the concept of class loader. According to the Java API document, “A class loader is an object that is responsible for loading classes”. In general, a Java program uses multiple class loaders, instead of a single one, to load classes. A classpath is the search path of a class loader. In other words, a Java program usually has multiple classpaths. (It is inappropriate to talk about the classpath of a Java program because a Java program has multiple classpaths. On the other hand, it is all right to talk about the classpath of a class loader.) The fact is that some of those classpaths can be discovered programmatically, some simply cannot.

For a typically Java program, the class loaders form a hierarchy. When a class loader is requested to find a class or a resource file, it will at first recursively delegate the request to its parent class loader. Only when its parent class loader cannot find the class or resource file, it will search it on its own search path.

Class Loader Hierarchy

On the top of the class loader hierarchy is the JVM’s built-in bootstrap class loader, which loads standard JDK classes. The search path of the bootstrap class loader can be found programmatically via a call to System.getProperty("sun.boot.class.path"). The search path is platform specific. On a Windows machine, it is <JAVA-HOME>/jre/lib. Typical jar files on this search path are rt.jar, jsse.jar, jce.jar etc.

As the child of the bootstrap class loader is the extension class loader, which loads JDK extension classes. The search path of the extension class loader can be found programmatically via a call to System.getProperty("java.ext.dirs"). The path is platform specific. On a Windows machine, it is <JAVA-HOME>/jre/lib/ext. An example of such JDK extension is sunjce_provider.jar.

As the child of the extension class loader is the system class loader, which loads classes on the path specified by OS environment variable CLASSPATH or the –classpath option to the JVM. The search path of the system class loader can be found programmatically via a call to System.getProperty("java.class.path").

If the Java program does not create its own user class loader, all non-JDK-standard/extension classes will be loaded by the system class loader. As we just said, the search path of this class loader can be easily found programmatically.

If the Java program creates its own user class loaders, unless the class loaders are of a custom class loader class with a method to retrieve the search path, there is no way to find the search path programmatically. If the user classes are executed in a JEE container, the JEE container is the bootstrap program and it always creates user class loaders to load user classes (usually one for each war or ear). Similarly, Maven always load plugin classes with user class loaders. Therefore, if the user classes are executed as a Maven plugin, don’t expect to find the search path for those classes by calling System.getProperty("java.class.path"). It is worth to notice that many applications and application servers actually use instances of the java.net.URLClassLoader as their user class loaders. In such cases, the classpath of the class loaders can be found by call the getURLs() method on the classloader instances (after casting them from java.lang.ClassLoader to java.net.URLClassLoader). The getURLs() method returns an array or URLs. Each URL returned is a directory on the classpath. In other words, the class loader will search those directories to find the classes wanted.

No matter the search path can be found programmatically or not, a program can always asks a class loader to find a resource file by calling the getResource(String name) method on the classloader object (or to get an InputStream connected to the resource file by calling the getResourceAsStream(String name) method on the classloader object). It will found the resource file if it is on the search path or in a jar file on the search path. By the way, the getSystemResource(String name) and getSystemResourceAsStream(String name) methods are to find the resource on the search path of the system class loader.

By the way, given any object, one can call the getClass() method on it to find the Class object representing its class. Then one can find the class loader by calling the getClassLoader() method on the Class object. In short, obj.getClass().getClassLoader() will return the class loader that loaded the class of obj, an object.

It is usually preferable to use the getResource(String name) method rather than the getResourceAsStream(String name) method for the reason of logging. The returned URL can be logged. In case there are inadvertently multiple resource files with the same name on different locations on the search path, the logged URL can help to debug. Even if there is only one resource file with the name, if there is some difficult to read from or write to it, the logged URL can point the programmer quickly to the file that needs a fix.

Friday, October 28, 2011

Type-Safety in Common Degister 3

Commons Digester is a Java library to parse XML data into Java objects. Calling the parse method on a Digester object with an file containing the XML data as argument returns the object on the top of the object tree corresponding to the XML data. The XML data may be arbitrary, so does the the type of the returned object. For this obvious reason, the parse method return type, in Digester 1 and 2, is Object (java.lang.Object). Clients of the library have to cast the returned objects into whatever types they actually are. This approach lacks type-safety. At compile time, no incorrect casting may be caught. A incorrect casting will throw a ClassCastException at runtime.

After generic was introduced into Java in Java 5, many libraries were improved with generic to increase type-safety. There were such efforts in Commons Digester 3, too. In Digester 3, a parse method of the Digester class is a generic method. Each parse method has a type parameter (<T>) and the return type is the abstract type T instead of Object. When the return is assigned to a variable, Java compiler can infer the type for the return. Class casting is no longer needed in the client code. This may give an impression that the risk of runtime ClassCastException is prevented. That is really a false insurance. In this case, what the generic method provides is just what is called auto-casting, that is, instead of the client code, the service code does the casting, with help of generic. All the benefit here is the convenience that the client does not need casting anymore. No risk of casting is prevented or reduced.

It is worth to point out that the Digester class is a non-generic class while the parse methods are generic methods. If an abstract type is at the same time of the type of more than one parameters and return of a generic method, the compiler can check the arguments and the variable to assign return to infer the concrete type and ensure the arguments and the return are assignable to the concrete type inferred. That increases type-safety comparing with using Object as the type for method parameters and return. It is not the case for the parse methods of the Digester class. In the case of the parse methods of the Digester class. The abstract type is only the type of the return. There is no way for the compiler to infer the right concrete type.

Sunday, October 9, 2011

Object Attributes and States, From Modeling to Programming

Regarding things as objects with attributes is a powerful thinking technique, learned by some of us as early as in elementary school. As an example, in the popular logic game for children, Zoombinis Logical Journey, every zoombini has four attributes. (In the game, the attributes are visually identified without a text name. For our convenience of discussion, we just name them hair, eyes, nose, and feet.)

Zoombini Attributes

An attribute has a value. Legitimate values for a specific attribute may be elements of a finite or infinite set (the domain of the attribute). In the Zoombinis example, each attribute may have one out of five possible values. An object's state is the combination of its attribute values. In the zoombinis example, the particular zoombini in the picture above is in a state that can be described as

A Zoombini State

All possible states of an attribute together form the state space of the attribute. All possible states of all attributes of a object form the state space of the object. All possible states of all objects of a software system form state space of the software system.

An object may be mutable. In that case, its state can be changed. An object may be immutable. In that case, its state cannot be changed.

UML, as the claimed universal modeling language, directly supports the object/attributes thinking techniques. In UML, an object may have many attributes. In UML, we may model zoombinis as objects of a Zoombini class:


Zoombini Class in UML

Unfortunately, UML does not offer a notation to distinct mutable and immutable attributes.

When we start coding zoombinis in a programming language, we often encounter a problem - most popular programming languages do not have the concept of attributes. For example, in Java, a object may have member fields and member methods, but not attributes. Because attribute is such a powerful concept in our rational thinking, we are unwilling to give it up so easily. What we can do is what Steve McConnell called "programming into a language". Actually, the JavaBean property concept is the same as object attribute. So we can code an object attribute as an object property in Java. For an immutable attribute, just omit the mutator method of the property. However, we can also code an object's mutable attribute as a public field of the object.

Saturday, October 1, 2011

Immutable Objects and val Variables Are Not the Same Thing in Scala

In Scala programs, some objects may be mutable, others immutable. A mutable object's state can be changed. An immutable object's state cannot be changed.

Listing 1 - A Mutable Object

object Example1 extends Application {
    val aList = scala.collection.mutable.Set(1, 2, 3)
    println(aList)
    
    aList.add(4)
    println(aList)
}

In Listing 1, we create a mutable Set object with 3 elements (viz 1, 2, and 3) in it. Then, add a new element, 4, to it. Run the program, it will display:

Set(2, 1, 3)

Set(2, 1, 4, 3)

Obviously, we can change the state of the mutable Set object. If instead, we create an immutable Set object, for example, with the following statement

val aList = scala.collection.immutable.Set(1, 2, 3)

We won't be able to add a new element into that Set object. Accordantly, there is not the add method for the immutable Set class. (The + method will create a new Set object rather than add a new element into the original set).

In Scala programs, an variable may be a var or val variable. An variable holds a reference to an object. An var variable can be re-assigned to reference another object. A val variable cannot be re-assigned.

Listing 2 - val Variable Cannot Be Re-Assigned

object Example1 extends Application {
    val aList = scala.collection.mutable.Set(1, 2, 3)
    aList = scala.collection.mutable.Set(1, 2, 3, 4)
}

When we compile the program in Listing 2. Scala compiler will complain: reassigment to val. If we change val to var as in Listing 3, the program will compile fine.

Listing 3 - var Variable Can Be Re-Assigned

object Example1 extends Application {
    var aList = scala.collection.mutable.Set(1, 2, 3)
    aList = scala.collection.mutable.Set(1, 2, 3, 4)
}

Notice that in Listing 1, even aList is a val variable, we can change the state of the object referenced by it since the object is a mutable object.

Even though both immutable and val are about somethings that cannot be changed, one is about the object states, another objects referenced. They are indeed totally different. However, beginner Scala programers may get confused.

Saturday, September 17, 2011

Learning Apache Commons Digester 3

Introduction

Apache Commons Digester 3 is a Java library to translate XML data to Java objects. It makes configuring Java applications with XML files much easier than other wise. In this tutorial, we are going to create a Family (Listing 2), an Address (Listing 3), and three Member (Listing 4) objects corresponding to the XML data in Listing 1.

To master Apache Common Disgester 3, one must really understand the key concepts: rules, matching patterns, and the object stacks.

Rules and Matching Patterns

A rule is a instance of a subclass of the Rule class, representing a set of actions (more on this later). For a rule to have effect, it must be registered on a matching pattern for XML elements (see examples in Listing 7 - Registering Rules), and must be associated with a digester (see an example at Line 24, Listing 6 - Parsing XML). When the digester walks through the XML element tree during the parsing phase, it will invoke the actions of a rule when it encounters the elements matched by the pattern. More specifically, the digester will call the begin(), body(), and end() methods on the rule object when it encounters the beginning tag, content, and ending tag of a matched element, respectively. The actions of a rule are implemented in the body of its begin(), body(), and end() methods. For the sake of briefness, we are going to just refer to the begin(), body(), end() methods as the begin(), body(), end() actions. And we are going to refer to an action invoked when the digester encounters a certain XML element as an action for the XML element; and the owner rule of the action as a rule for the XML element. Of course, a rule may be a rule for Element A and Element B at the same time as long as it is registered on patterns that match Element A and Element B.

The matching pattern syntax is very simply. For elements in Listing 1,

the <family> element can be matched by pattern "family"
the <address> element by either "family/address" or "*/address"
a <firstname> element by "family/member/firstname" or "*/firstname".

An action can be anything. The most frequent actions are creating Java objects, setting JavaBean properties with XML element contents or attribute values, and linking a Java object with another. There are many built in subclasses of the Rule class, for example, ObjectCreateRule, SetPropertiesRule, BeanPropertySetterRule, SetNextRule, etc, and are just for those purposes.

Order of Actions

For rules registered on patterns that match different elements, the order of rule registration does not matter. A begin() action for a XML element is always invoked before any actions for its nested elements; similarly, a end() action for a XML element is always invoked after any actions for its nested elements. If there are two rules for the same element, say Rule A is registered on a pattern that matches the element before Rule B is regiestered on the same pattern or another pattern that also matches the element, the order of action execution will be:

Rule A's begin() action
Rule B's begin() action
Rule B's end() action
Rule A's end action

The Object Stacks

A digester maintains many object stacks. One is called the default stack, another is the parameter stack. In addition, it may hold any number of named stacks. Java objects created during the parsing process are pushed to and popped out of the stacks (by the rules). Many built-in rules, such as SetPropertiesRule, BeanPropertySetterRule, and CallMethodRule, just work on the object on the top of the default stack. An ObjectCreateRule creates a new Java object and pushes it to the default stack during its begin() method execution, and pops it out during its end() method execution.

Any Rule object can call its getDigester() method to retrieve a reference to the digester that it associates with. Via the digester, a rule can push object to, pop objects out, or peek objects in the default stack, the parameter stack, or any named stack by calling the digester's methods:

push() - push to the default stack
pop() - pop out the default stack
peek() - peek into the default stack
pushParams() - push to the parameter stack
popParams() - pop out the parameter stack
peekParams() - peek into the parameter stack
push(stackName) - push to the named stack
pop(stackName) - pop out the named stack
peek(stackName) - peek into the named stack

When you create your own rule class that pushes to/pops from a stack. You better use a named stack specific for that rule class to avoid intervention with the built-in rule classes, which use the default stack, or other rule classes created by you. Of course, a rule class should only pop object (in its end() method) that was pushed by itself (in its begin() method). Be aware of the fact that a stack of a disgeter is shared by all rules associate with it. If two rules for the same element push to/pop out the same stack, it will be difficult for other rules on that element, or elements nested in it to know which object is on the top of the stack at the time of its actions. The stacks, particularly the default stack, give the developers a lot of convenience. They, however, also introduce a lot of tight coupling among the rule actions. Treat them as sharp knifes that can easily cut fingers accidentally.

Calling the parse() method on a digester returns the object at the bottom of the default stack. For example, the call to the parse() method at Line 26, Listing 6 returns a Family object for this object is created when the digester encounters the <family> element and is the first object pushed to the default stack.

A Simple Example

Listing 1 - family.xml

<family name='Addison'>
    <address city='New York' state='New York' country='USA'>
        <street>Apt. 3522, 10 West Street</street>
    </address>
    
    <member>
        <firstname>Thomas</firstname>
        <gender>M</gender>
        <age>25</age>
    </member>
    
    <member>    
        <firstname>Linda</firstname>
        <gender>F</gender>
        <age>24</age>
    </member>
    
    <member>    
        <firstname>Alice</firstname>
        <gender>F</gender>
        <age>1</age>
    </member>
</family>

The following three classes in Listing 2, 3, and 4 are simple Java class with basically setter/getter methods. The only thing that readers should pay a little attention is that the Family class has an addMember() method to add a member to the family per call (Line 32 - 34, Listing 2).

Listing 2 - The Family class

package commons.digester3.example;

import java.util.ArrayList;
import java.util.Collections;
import java.util.List;

public class Family {
    private String name;
    private List<Member> members = new ArrayList<Member>();
    private Address address;
    
    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }    

    public Address getAddress() {
        return address;
    }

    public void setAddress(Address address) {
        this.address = address;
    }

    public List<Member> getMembers() {
        return Collections.unmodifiableList(members);
    }
    
    public void addMember(Member member) {
        members.add(member);
    }
}

Listing 3 - The Address class

package commons.digester3.example;

public class Address {
    private String street;
    private String city;
    private String state;
    private String country;
    
    public String getCity() {
        return city;
    }
    
    public void setCity(String city) {
        this.city = city;
    }
    
    public String getState() {
        return state;
    }
    
    public void setState(String state) {
        this.state = state;
    }
    
    public String getCountry() {
        return country;
    }
    
    public void setCountry(String country) {
        this.country = country;
    }

    public String getStreet() {
        return street;
    }

    public void setStreet(String street) {
        this.street = street;
    }
}

Listing 4 - The Member class

package commons.digester3.example;

public class Member {
    private String firstname;
    private char gender;
    private int age;
    
    public String getFirstname() {
        return firstname;
    }
    
    public void setFirstname(String firstname) {
        this.firstname = firstname;
    }
    
    public char getGender() {
        return gender;
    }
    
    public void setGender(char gender) {
        this.gender = gender;
    }
    
    public int getAge() {
        return age;
    }
    
    public void setAge(int age) {
        this.age = age;
    }
}

Listed in Listing 5 is the code to parse the XML and to create the Family, Address, and Member objects. The only Digester-specific code are at Line 24 - 26. The FamilyModule class is a rule module class, which we are going to discuss later in this tutorial. From the code in Listing 5, we can see that a Family object is returned from the call to the parse() method on a Digester object. In fact, the Address and Member objects are also created and associated with the Family object. We can see it via the unit testing code in Listing 6.

Listing 5 - Parsing XML

package commons.digester3.example;

import java.io.IOException;
import java.io.InputStream;

import org.apache.commons.digester3.Digester;
import org.apache.commons.digester3.binder.DigesterLoader;
import org.xml.sax.SAXException;

public class FamilyCreator {    
    /**
     * Creates a Family object (and Address, Member objects contained by it) based
     * on XML data.
     * 
     * @param source - name of the XML file
     * 
     * @throws SAXException
     * @throws IOException
     */
    public static Family createFamily(String source) throws SAXException, IOException {
        Family result = null;
        InputStream inputStream = FamilyModule.class.getClassLoader().getResourceAsStream(source);
        
        DigesterLoader digesterLoader = DigesterLoader.newLoader(new FamilyModule());
        Digester digester = digesterLoader.newDigester();    
        result = digester.parse(inputStream);
        
        return result;
    }    
}

Listing 6 - JUnit Tests

package commons.digester3.example;

import java.io.IOException;
import java.util.List;

import junit.framework.Assert;

import org.junit.BeforeClass;
import org.junit.Test;
import org.xml.sax.SAXException;

public class SimpleDigesterTest {
    private static Family family = null;
    
    @BeforeClass
    public static void setup() throws IOException, SAXException {
        family = FamilyCreator.createFamily("family.xml");
    }
    
    @Test
    public void testFamily() {
        Assert.assertNotNull("Family was not created.", family);
        
        Assert.assertEquals("Incorrect family last name", "Addison", family.getName());
    }
    
    @Test
    public void testAddress() {
        Address address = family.getAddress();
        
        Assert.assertNotNull("Address was not created.", address);
        
        Assert.assertEquals("Incorrect street line", "Apt. 3522, 10 West Street", address.getStreet());
        Assert.assertEquals("Incorrect city", "New York", address.getCity());
        Assert.assertEquals("Incorrect state", "New York", address.getState());
        Assert.assertEquals("Incorrect coutry", "USA", address.getCountry());
    }
    
    @Test
    public void testMember() {
        List>Member< members = family.getMembers();
        Assert.assertNotNull("Family members were not created.", members);
        
        Assert.assertEquals("Incorrect member count.", 3, members.size());
        
        Member member = members.get(1);
        Assert.assertEquals("Incorrect first name", "Linda",member.getFirstname());
        Assert.assertEquals("Incorrect gender", 'F', member.getGender());
        Assert.assertEquals("Incorrect age", 24, member.getAge());
    }
}

The FamilyModule class in Lising 7 is a rule module class. A rule module class is basically a set of pairs of rule and matching pattern. A digester will take a rule module (see Line 24 in Listing 5) to figure out which rule to fire for which element. The in-line comments explain the rules.

Listing 7 - Registering Rules

package commons.digester3.example;

import org.apache.commons.digester3.binder.AbstractRulesModule;

public class FamilyModule extends AbstractRulesModule {
    
    @Override
    protected void configure() {

        // Register a ObjectCreatRule on matching pattern "family". Later on, in the parsing phase, 
        // when encounters a <family> element, the digester will fire this rule to create a Family object.
        // Also register a SetPropertiesRule on the same pattern. Later on, in the parsing phase,
        // the digester will fire this rule to set properties of the Family object  
        // with the attribute values of the <family> element
        // For the setProperties() to work this way, a property name must be the same as the attribute name.
        forPattern("family").createObject().ofType("commons.digester3.example.Family")
                .then().setProperties();
        

        // ... Also register a SetNextRule on matching pattern "family/address" to establish relationship 
        // between the Family and the Address object by calling the setAddress() method on the Family
        // object (expected to be the object next to top of the default stack) and passing the Address object
        // (expected to be the object on top of the default stack) as argument to it.
        forPattern("family/address").createObject().ofType("commons.digester3.example.Address")
                .then().setProperties()
                .then().setNext("setAddress");
        
        // Register a BeanPropertySetterRule on matching pattern "family/address/street", to
        // set the property of the Address object named street with the content of the <street>
        // element.
        forPattern("family/address/street").setBeanProperty();
        

        // ... to establish relationship between the Family and the Member object by calling
        // the addMember() method on the Family object and passing the Member object as argument to it.
        forPattern("family/member").createObject().ofType("commons.digester3.example.Member")
                .then().setNext("addMember");
        
        forPattern("family/member/firstname").setBeanProperty();
        forPattern("family/member/gender").setBeanProperty();
        forPattern("family/member/age").setBeanProperty();
    }
}

Beyond The Simplest

Mismatch Between Attribute and Property Name

In out example above, all element attribute names match the corresponding JavaBean property names. What we have to do, if there is a mismatch, for example, the <family> element has an attribute named "name", but the Family object has setLastname() and getLastname() methods? All we have to do is to make an addAlias(attributeName, propertyName) call after calling setProperties(). For example, instead of have Line 15 - 16 in Listing 7, we are going to have the following code:

forPattern("family").createObject().ofType("commons.digester3.example.Family")

        .then().setProperties().addAlias("name", "lastname");

Mismatch Between Nested Element and Property Name

In our example above, all nested element names match the corresponding JavaBean property names. For example, a <member> element has nested elements <firstname>, <age>, and <gender>, and the corresponding Member object has setFirstname(), setGender(), and setAge(). What we have to do, if there is a mismatch, for example, the <member> element has nexted <name> instead of <firstname>? All we have to do is to change the matching pattern and make an withName(propertyName) call after calling setBeanProperty(). For example, instead of have Line 38 in Listing 7, we are going to have the following code:

forPattern("family/member/name")
        .setBeanProperty().withName("firstname");

Using CallMethodRule

Some mismatches are OK. Digester has some default converter to convert them. For example, even though all element attribute values and contents are strings, properties of type char, int, will not need explicit conversion. In our example, age of a member is of type int, and Digester implicitly converts a string into an int.

Some type mismatches will be issues. Suppose that our Member class is like in Listing 8. The getGender() method, instead of return a character, returns a enum Gender, which can be either F or M, as in Listing 9. Even though the setGender(char) method signature is the same as before, gender for Member is no longer a JavaBean property for the type of return of getGender() is not the same as the type of parameter to the setGender() method. For this reason, a BeanPropertySetterRule will not work for this case. To still call the setGender() method on a Member object to set the gender of the member based on content of the nested <gender> element, we need to use the CallMethodRule. Below is the new code to replace Line 39 in Listing 7.

forPattern("family/member/gender")
        .callMethod("setGender").withParamCount(1)
        .withParamTypes("java.lang.Character")
        .then().callParam();

Listing 8 - A new Version of the Member class

package commons.digester3.example;

public class Member {
    private String firstname;
    private Gender gender;
    private int age;
    
    public String getFirstname() {
        return firstname;
    }
    
    public void setFirstname(String firstname) {
        this.firstname = firstname;
    }
    
    public Gender getGender() {
        return this.gender;
    }
    
    public void setGender(char gender) {
        if (gender == 'F') {
            this.gender = Gender.F;
        } else if (gender == 'M') {
            this.gender = Gender.M;
        } else {
            throw new RuntimeException("Invalid gender code " + gender + ". It can only be 'F' or 'M'");
        }
    }
    
    public int getAge() {
        return age;
    }
    
    public void setAge(int age) {
        this.age = age;
    }
}

Listing 9 - The Gender enum

package commons.digester3.example;

public enum Gender {
    F, M
}

Sunday, September 11, 2011

Mixing Scala and Java in a Project, Developed in Eclipse and Built by Maven

(Last updated on February 26, 2012)

Overview

This is a tutorial showing how to setup a project (aka module) with both Scala and Java source code, to be built by Maven. In addition, this tutorial also shows how to setup the exact project in the Eclipse IDE so that developers can code and compile both Scala and Java source code in the same project.

If you only want to develop a mixed Java/Scala project in Eclipse, and you don't care about to build the project outside Eclipse using Maven and to manage Eclipse build path using Maven, you can do so using the Scala IDE for Eclipse out of box.

Arguably, the largest advantage of the Java programming language over many other programming languages is the existence of countless libraries in Java. Scala fully leverages this advantage of Java. Scala is very attractive for on one hand, it is a much more powerful programming language, and on the other hand, a Scala method can call almost any Java method as easy as another Java method does so. A Java method can also easily call a Scala method with certain limitations.

When develop a new project (as the term is used in the Eclipse IDE or the Maven build tool), it may be very desirable to have both Java and Scala classes in it. We may have some developers in the project who can only write Java code. We may also have to write some classes in Java in order to fit into certain frameworks or containers. Under those circumstances, a pure Scala project is not an option. If we still want to take advantage of the power of Scala, we have to mix Scala and Java in the same project. Being able to easily mix Scala and Java in the same project will greatly lower the obstacle for organizations to adopt Scala.

This tutorial is about the tricks in a Maven POM file to enable us to

Develop a project with both Scala and Java classes in Eclipse, where some Java methods call Scala methods, and some Scala methods call Java methods.
Use Maven to manage project build path in Eclipse
Build the project outside Eclipse using Maven

For a discussion on the advantages of using Maven to manage Eclipse project build path, see my other post.

Being able to build the project outside Eclipse using Maven makes it very easy to include that project into a larger project as a sub-project (called a module in Maven) later.

To follow this tutorial, you need:

JDK 1.6 (not JDK 1.7)
Eclipse Classic 3.6.x (not 3.7)
The Scala IDE for Eclipse 2.0. The Scala IDE for Eclipse is an Eclipse plugin. If you need help in installing the plugin, please see the Appendix A at the end of this tutorial.
Maven 2.2.1 or later

In this tutorial, we develop a Hello World program consisting of:

A Java class, GreetingInJava, with a method greet() which simply prints “Hello World!” to the console. It is normally the only class in a Java Hello World program.
A Scala class GreetingInScala, with a method greet(), which instantiates a GreetingInJava instance, and call its greet() method. This is to show calling a Java method from a Scala method.
A Java class, Bootstrap, with a main() method, in which a GreetingInScala instance is instantiated and the greet() method is called on that instance. This is to show calling a Scala method from a Java method.

Layout the Project

We are going to create the project outside Eclipse and to import it into Eclipse later. For more detailed about creating a project outside Eclipse and importing it into Eclipse, see my other post. Under C:\temp, create a directory structure as below.

Create Maven pom.xml

Create a pom.xml file under C:\temp\hello with the following content:

<?xml version="1.0" encoding="UTF-8"?>

<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
                      http://maven.apache.org/maven-v4_0_0.xsd">
    
    <modelVersion>4.0.0</modelVersion>
    <groupId>ted-gao</groupId>
    <artifactId>scala-java-mix</artifactId>
    <version>1.0-SNAPSHOT</version>
    <name>Scala-Java mixture</name>
    <description>Showcase mixing Scala and Java</description>
    <packaging>jar</packaging>
    
    <build>
        <plugins>
            <!-- ensure that we use JDK 1.6 -->
            <plugin>
                <inherited>true</inherited>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <configuration>
                    <source>1.6</source>
                    <target>1.6</target>
                </configuration>
            </plugin>
            
            <plugin>
                <groupId>org.scala-tools</groupId>
                <artifactId>maven-scala-plugin</artifactId>
                <version>2.15.2</version>
                <executions>
                    <!-- Run scala compiler in the process-resources phase, so that dependencies on
                         scala classes can be resolved later in the (Java) compile phase -->
                    <execution>
                        <id>scala-compile-first</id>
                        <phase>process-resources</phase>                        
                        <goals>
                            <goal>compile</goal>
                        </goals>
                        </execution>

                        <!-- Run scala compiler in the process-test-resources phase, so that dependencies on
                             scala classes can be resolved later in the (Java) test-compile phase -->                    
                        <execution>
                        <id>scala-test-compile</id>
                        <phase>process-test-resources</phase>
                        <goals>
                            <goal>testCompile</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
            
            <plugin>
                <groupId>org.codehaus.mojo</groupId>
                <artifactId>build-helper-maven-plugin</artifactId>
                <executions>
                    <!-- Add src/main/scala to source path of Eclipse -->
                    <execution>
                        <id>add-source</id>
                        <phase>generate-sources</phase>
                        <goals>
                            <goal>add-source</goal>
                        </goals>
                        <configuration>
                            <sources>
                                <source>src/main/scala</source>
                            </sources>
                        </configuration>
                    </execution>
                      
                    <!-- Add src/test/scala to test source path of Eclipse -->
                    <execution>
                        <id>add-test-source</id>
                        <phase>generate-test-sources</phase>
                        <goals>
                            <goal>add-test-source</goal>
                        </goals>
                        <configuration>
                            <sources>
                                <source>src/test/scala</source>
                            </sources>
                        </configuration>
                    </execution>
                </executions>
            </plugin>
            
            <!-- to generate Eclipse artifacts for projects mixing Scala and Java -->
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-eclipse-plugin</artifactId>
                <version>2.8</version>
                <configuration>
                    <downloadSources>true</downloadSources>
                    <downloadJavadocs>true</downloadJavadocs>
                    <projectnatures>
                        <projectnature>org.scala-ide.sdt.core.scalanature</projectnature>
                        <projectnature>org.eclipse.jdt.core.javanature</projectnature>
                    </projectnatures>
                    <buildcommands>
                        <buildcommand>org.scala-ide.sdt.core.scalabuilder</buildcommand>
                    </buildcommands>
                    <classpathContainers>
                        <classpathContainer>org.scala-ide.sdt.launching.SCALA_CONTAINER</classpathContainer>
                        <classpathContainer>org.eclipse.jdt.launching.JRE_CONTAINER</classpathContainer>
                    </classpathContainers>
                    <excludes>
                        <!-- in Eclipse, use scala-library, scala-compiler from the SCALA_CONTAINER rather than POM <dependency> -->
                        <exclude>org.scala-lang:scala-library</exclude>
                        <exclude>org.scala-lang:scala-compiler</exclude>
                    </excludes>
                    <sourceIncludes>
                        <sourceInclude>**/*.scala</sourceInclude>
                        <sourceInclude>**/*.java</sourceInclude>
                    </sourceIncludes>
                </configuration>
            </plugin>
            
            <!-- When run tests in the test phase, include .java and .scala source files -->
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-surefire-plugin</artifactId>
                <version>2.8.1</version>
                <configuration>
                    <includes>
                        <include>**/*.java</include>
                        <include>**/*.scala</include>
                    </includes>
                </configuration>
            </plugin>
        </plugins>
    </build>

    <dependencies>
        <dependency>
            <groupId>org.scala-lang</groupId>
            <artifactId>scala-library</artifactId>
            <version>2.9.0</version>
        </dependency>
    </dependencies>
</project>

It is OK for Java classes/interfaces to depend on Scala classes/objects because Scala classes/objects are compiled before Java classes/interfaces. It is also OK for Scala classes/objects/traits to depend Java classes/interfaces because Scala classes/objects/traits can be compiled against both Java source code and bytecode.

Create Eclipse Artifacts

Run Maven Eclipse plugin to create Eclipse project artifacts and configure the Eclipse workspace with path variable M2-REPO pointing to the local Maven repository. If you need help to do those tasks, please see my other post.

Create Java and Scala Classes

In Eclipse, create the following two Java classes and one Scala class.

GreetingInJava is a simple Java class.

package mix.java.scala;

package mix.java.scala;

public class GreetingInJava {
    public void greet() {
        System.out.println("Hello World!");
    }
}

GreetingInScala is a simple Scala class that calls Java.

package mix.java.scala

class GreetingInScala {
    def greet() {
        val delegate = new GreetingInJava
        delegate.greet()
    }
}

Bootstrap is a simple Java class that calls Scala. If you need help to create new Scala classes in Eclipse, see the Appendix B at the end of this tutorial.

package mix.java.scala;

public class Bootstrap {

    public static void main(String[] args) {
        GreetingInScala scala = new GreetingInScala();
        scala.greet();
    }
}

Run Bootstrap.java in Eclipse

Right click Bootstrap.java in the source editor in Eclipse. From the context menu, select Run As... -> Java Application.

You will see string "Hello World!" is printed to the console.

Build the Project Outside Eclipse, Using Maven

Open a command line window and cd to C:\temp\hello, execute the following command:

mvn install

You will see Maven compile the three Java and Scala classes and package them into a jar file.

Run Bootstrap.java outside Eclipse

Open a command line window and cd to C:\temp\hello, if you have not done so. Execute the following command:

mvn exec:java -Dexec.mainClass=mix.java.scala.Bootstrap

You will see "Hello World!" is printed to the command line window.

Conclusion

With the pow.xml in this tutorial, we will be able to create projects with Java and Scala source code mixed. We will be able to develop both Java and Scala source code in Eclipse. We will also be able to run the Java and Scala classes in Eclipse. We will also be able to build the project and to execute the classes outside Eclipse using Maven.

Appendix A – Installing the Scala IDE for Eclipse

On the Eclipse workbench, select Help -> Install New Software ...

When the Install dialog box appears, click the Add button. When the Add Repository dialog box appears, fill in a name for the plugin and the URL of the update site for the Scala IDE for Eclipse, and click the OK button.

Back to the Install dialog box, check all available plugins as shown on the snapshot below.

When you see the following warning, ignore it and click the OK button.

When the installation is finished, you need to restart Eclipse.

Appendix B – Creating New Scala Class in the Scala IDE for Eclipse

On the Package Explorer view, right click the desired package. From the context menu, select New -> Other… When the New dialog box appears, select Scala Wizards -> Scala Class.

[If you are also interested in developing Scala programs using Eclipse and sbt (instead of Maven) together, please read Using Eclipse and sbt in Scala Programming.]

Tricks for Better Software