java boilerplate

Saturday, December 03, 2005

Welcome to java boilerplate!

Each blog article is its own boilerplate problem and usually contains a solution.

If any solution has code attached to it, feel free to use it - it's placed in the public domain.

See this article from Graham Hamilton for some history on why this blog exists.

Thursday, December 01, 2005

'Function' interfaces.

When one needs to pass a function as an argument, you need to go through a rather convoluted anonymous inner class definition, which tends to get a bit overly wordy:
     Collections.sort(names, new Comparator<String>() {
public int compare(String a, String b) {
return 0; //Some sort of comparison algorithm.
}
});


Yet, the compile-time checking enforced by, amongst other things, the structure of the 'Comparator' interface and the generics features of 'Comparator', is clearly something that should be kept if possible. We propose the notion of a 'function' interface. Our example code now changes to:
     Collections.sort(names, Comparator<String>(String a, String b) {
return 0; //Some sort of comparison algorithm.
});


The definition of Comparator must also change to reflect that it can be used as a function interface:

     public @Function interface Comparator<T> {
int compare(T o1, T o2);
}


The 'Function' annotation is a marker annotation (has no fields). It's only valid on interfaces. The interface must have only one method (mentioning some methods that Object already has in order to explain some particulars in the javadoc is allowed).

BACKWARDS COMPATIBILITY: No problems here. No new keywords are neccessary. All existing 'function interfaces', such as Comparator and Iterable can be refitted with @Function markers.

annotations for Getters and Setters

Writing getters and setters that do nothing beyond reading/writing to an internal private field is such a common job that many java IDEs automate for you. However, the getters and setters still clutter up the code. Generating them is still required in order to ensure that any later additions to the getter or setter code can be added without breaking the API.

We propose simplifying the process by using annotations.

old code:
     private String field;

public String getField() {
return field;
}

protected void setField(String field) {
this.field = field;
}


new code:
     private @Get @Set(Access.PROTECTED) String field;


Access is an enum with PUBLIC, PROTECTED, PACKAGE, and PRIVATE as valid options. The @Get and @Set annotation each have a single value of type Access, defaulting to Access.PUBLIC.

A java.util.Iterators utility class

Analogous to java.util.Arrays and java.util.Collections, introduce a java.util.Iterators class which contains some useful utility methods for interfaces. Some useful methods:

  • A method that turns an interator into an iterable

  • A 'filter' iterator that takes as arguments another iterator and a mapping function, and returns a filter iterator. For example:
         Iterator i = //Something
    Iterator i2 = Iterators.unmodifiableIteratorAdapter(i,
    Adapter<String, Integer>(String e) {return Integer.valueOf(e);});


    The Adapter interface is new and looks like this:
         public @Function interface Adapter<I, O> {
    O filter(I input);
    }

  • A method that turns an array into an iterator

  • A method that returns a properly generics-typed empty iterator


The code examples here use the @Function interface idea described elsewhere in this blog.

We have an example Iterators implementation (does not use the @Function suggestion - it can be used in java 1.5). Fully documented:

Iterators.java.

String Literals

Entering html, xml, or especially regular expressions in java tends to be a difficult exercise due to the need to enter many escapes.

We propose a string literal operator: strings contained inside matching pairs of 3 quotes are read directly, without parsing single quotes, backslashes, or newlines. Only the \u escape is parsed (as that occurs on a different level).

old code:
     return "Usage:   \nmyapp \"filename\" -options\n";


new code:
     return """Usage:
myapp "filename" -options
""";

compile regular expressions at compile-time.

We propose a new string-like atomic parser item which produces java.util.regex.Pattern objects. The compiler will actually compile these static regular expressions directly. A few APIs should also be extended to take Patterns as valid arguments, such as String.replaceAll and String.matches.

old code:
     //This would throw a runtime exception.
Pattern p = Pattern.compile("*");


new code:
     //This will produce a compile-time error.
Pattern p = R"*";

//This is fine
Pattern k = R"(foo|bar)?\"(.*)\"\\s+";

//This uses the proposed string literal notation.
Pattern r = R"""(foo|bar)?"(.*)"\s+""";

null-checking reference operator.

An oft-recurring java situation: You intend to return null if something is not found, and some sort of result otherwise. Unfortunately, to get to the 'result', you need to make one or even more references (the dot ('.') operator). At any step, the result could be null. Writing this hypothetical method as a one-liner does not work, as it would potentially throw NullPointerException instead of returning null.

We propose a new form of the reference operator which returns null if the left side is null, and returns the result of the operation (on the right side) if not.

old code:
     if ( map == null ) {
return null;
} else {
return map.get(param);
}


new code:
     return map.?get(param);


The two code examples above will be similar, and, in fact, the class code will probably be exactly the same. It's a compiler-only effect.

Unneccessary UnsupportedEncodingException

A number of charsets are guaranteed supported by any JVM implementation, such as UTF-8 and ISO-8859-1. However, when explicitly specifying character encoding, even with one of these guaranteed character sets, you still have to deal with an UnsupportedEncodingException which can't actually happen. For example: the String constructor with parameters (byte[] data, String encoding) or for example the InputStreamReader and its (InputStream stream, String encoding) constructor.

We propose fitting an overloaded copy of all constructors and methods that have a String encoding parameter, where this parameter is replaced with the BasicCharset enum, which contains an option for each guaranteed encoding.

old code:
     String string;
try {
//data is a byte array.
string = new String(data, "UTF-8");
} catch ( UnsupportedEncodingException e ) {
throw new InternalError("Can't happen.");
}


new code:
     String string = new String(data, BasicCharset.UTF8);


The new enum based methods and constructors will of course no longer throw UnsupportedEncodingException.

BACKWARDS COMPATIBILITY: No problems at all. The only change is an API one, and it only consists of new classes and functions. Old code will continue to work.

Ignoring exceptions.

The compiler error 'exception either needs to be caught or thrown' is just there as a safety measure; a hacked version of javac that silently ignores these problems will produce perfectly valid class files. It is in fact already possible to throw any exception 'on the sly' (see java puzzlers, the book. The first comment of this entry also explains some ways on how this can be accomplished).

Sometimes one is forced to catch an exception which doesn't really need catching. In some cases, the exception in question can't actually ever happen (such as UnsupportedEncodingException when using a direct reference to "UTF-8", or IllegalAccessException when using reflection to access fields or methods that you just explicitly enabled access on using the setAccessible method). At other times the exception is extremely unlikely; the rather catastrophical failure of silently passing it up the stack is in fact acceptable. At still other times the code is used only by programmers themselves, such as utility software or test code. In many such circumstances, being forced to set up try/catch blocks (due to ie restrictions on what exceptions may be thrown because you're implementing an interface or overriding methods) is annoying, hinders readability, and, according to various books, including 'java puzzlers', introduces significant performance penalties.

We suggest a method to manually 'override' the failsafe system that ensures all non-RuntimeException/Error based exceptions are handled one way or another.

We offer two solutions for this; an annotation based solution, and a keyword based solution:

old code:
     public Reader foobar() {
try {
return new InputStreamReader(System.in, "UTF-8");
} catch ( UnsupportedEncodingException e ) {
throw new Error();
}
}


new code - annotation based method:

     @IgnoreExceptions(UnsupportedEncodingException.class)
public Reader foobar() {
return new InputStreamReader(System.in, "UTF-8");
}


where the IgnoreExceptions annotation looks like this:

     @Retention(RetentionPolicy.CLASS) @Target(ElementType.METHOD)
public @interface IgnoreExceptions {
Class[] value() default {Throwable.class};
}


the other keyword based method:
(This way requires a new keyword, 'ignores'. This could pose backwards compatibility problems)
     public Reader foobar() ignores UnsupportedEncodingException {
return new InputStreamReader(System.in, "UTF-8");
}


ignores and throws are both allowed and can be applied to a method in either order, similar to how both implements and extends are legal on a class declaration.

Quicker way of handling empty for loops.

Oftentimes you need to do something special if a for loop will not ever run; that is, the loop condition is false even before we ever iterate into the for loop. We propose a new keyword, 'empty', which can be added onto a for loop. This code will be called if the for loop's loop condition is false right away.

old code:
     int length = someObject.size();
if ( length == 0 ) {
throw new EmptyCollectionException();
}
for ( int i = 0 ; i < length ; i++ ) {
// ... whatever
}

new code:
     for ( int i = 0 ; i < someObject.size() ; i++ ) {
// ... whatever
} empty {
throw new EmptyCollectionException();
}


BONUS: Add a pythonesque 'else' statement as well, which is called when the for loop exits normally (but not if the for loop ends abruptly, by an in-for return or throws statement, a break statement, or a continue jumping to an outer label. I don't think 'else' is the proper term for it, though.

BACKWARD COMPATIBILITY PROBLEMS: 'empty' is not currently a reserved keyword. An existing reserved keyword, such as 'else' can be used for this, but 'else' is a really bad idea. Python supports 'else' on a for loop but its meaning is completely different.

Xml Parsing

The DOM system especially seems much better suited to writing generic XML parsers, not the much more usual day to day business of parsing specific XML - XML for which you know the structure. Even the SAX system is generally considered quite unwieldly.

We suggest a new parser, based on direct-to-memory principles, which works by 'mapping' the structure of the XML onto classes directly. The java code of these POJO (Plain Old Java Object) classes can be explicitly written, or it can be generated from an XML Schema.

This xml system should be simple, self-contained, and have a low size footprint, so that it can be included in any project (or, better yet, as we're proposing java boilerplate) - just ship with mustang out of the box.

Annotations are used to identify the mappings. Example:

@Tag(namespace = "http://foo/bar")
public class RootElement {
@Attribute String id;
@Map String author;
@Map(tagName="published", dateFormat="yyyy-MM-dd HH:mm")
long publishedInMillis;
@Map(mapType=MapType.MULTIPLE_TAGS,
targetClass = IconRef.class, tagName="icon")
List icons;
}

@Tag
class IconRef {
@Attribute int id;
@SimpleContent String url;
}


The above class structure is basically all that's required; register the classes and you can now read for example the following XML; you end up with an instance of RootElement:

<?xml version="1.0" encoding="UTF-8"?>
<rootelement id="something">
<author>Reinier Zwitserloot</author>
<published>2005-11-30 08:01</published>
<icon id="icon1">http://example/icon1.jpg</icon>
<icon id="icon2">http://example/icon2.jpg</icon>
</rootelement>


The same system can be used to write XML.

Other features:
  • @AttributeMap - can be applied to a java.util.Map. Stores all attributes as key/value pairs in this map.

  • getters and setters - the getters and setters are called if they exist, instead of directly writing/reading the field

  • Aside from mapType=MapType.SINGLE_TAG (the default) and MapType.MULTIPLE_TAGS (shown in the example), there's also MapType.COLLECTION_TAG, which assumes the tagName refers to a tag which has no (important) attributes. The contents of that tag are loaded into the list.

  • The entire system uses sane defaults. If you don't specify an explicit tagName (either on a @Map or a @Tag annotation), the instruction matches any tag matching the name of the class (or field) in any case when reading, and translates the field/class name to lowercase when writing. COLLECTION_TAG and MULTIPLE_TAGS works on any standard java collection (Lists, Sets, even ConcurrentLists and such), and also work on arrays.

  • Whitespace is parsed and normalized automatically, unless this feature is explicitly turned off (by setting a whitespaceNormalization = false on one of the annotations), or if the data is contained inside a CDATA element. CDATA and text elements are mixed together into a single string automatically.

  • The system consists of a reader, a writer, and an XML Schema to class generator.

  • Attributes can have defaults: @Attribute(defaultValue = "something") int something; which will be used if the attribute does not exist in the XML data. When writing, the attribute is omitted if the data in the object structure matches the default.



I call it MOX (Mappable Object Xml) and its intended use is for daily basic XML parsework. Extremely complicated XML should probably be tackled with DOM or SAX, not this system. As such it's entirely reasonable that various 'advanced' XML things can't be done using MOX; keeping MOX simple to write, use, and understand is more important than making sure it can handle every possible type of XML. DTD stuff and processing instructions are completely ignored, as a simple example. The parser just skips right over them.

Here's a sample beta implementation. Most importantly - it only reads and writes to fields. setters/getters are ignored. Examples included: MOX download page.

NB: This example implementation is public domain code. Feel free to do whatever you like with it.

catch-all

Tack a comment onto this post with your own favourite boilerplate (with solution!) and I'll look into giving it its own article.