Automatic generation of code
Introduction – who is it for?
There are two fundamental cases in which automatic generation of code can be used: when a developer wants to quickly write a large portion of repeatable code or when several people are implementing a big system, designated to be developed and maintained for many years to come.
Development without automatic generation of code – an example
In order to inspire the reader's imagination a bit, I am going to present a scenario that we might encounter when we are not using automatic code generation in the most common place that is database access ( typically called O/R mapping ). Let us assume that we are doing it using plain JDBC. We have a fragment of an SQL script that extracts data from a table called "client", including a field called "address":
q = "SELECT ..., address, ... FROM client WHERE id = 1";
The results of the query are then saved in a local variable:
String address = result.getString( "address" );
Furthermore, the address field is also used in several other tables, e.g. ORDER
, SUPPLIER
, COLLECTION_POINT
etc.
Let us assume now that after our application has been used for a year we have to implement it in another country, where an address is customarily stored in two lines instead of one. A person responsible for database maintenance changes all occurrences of the field "address
" to "address_line_1
" and "address_line_2
" in the database model, prepares the appropriate migration scripts and then lets a programmer take over from that point. In a large team it is most likely going to be a different person from the one that wrote the original code, so the best he or she can do is to methodically search the code base for all cases of a string "address" and patiently adjust it so it can deal with the two new fields.
Unfortunately this approach carries two significant dangers with it:
1) incompleteness: it can happen that a developer, being just a human after all, who tend to have their better and worse days, may not find all occurrences of the string in question, especially if his or her predecessor was to leave something like the following behind:
q = "SELECT …, addr" + "ess, … FROM client WHERE id = 1";
2) unnecessary changes: a developer overdoes it and alters code in places that are not affected by the changes in the database model.
We are not going to find out about a potential mistake when the code is compiled, we will usually only discover it later, either in the functional tests phase or ( quite often ) after our code has gone into production, courtesy of an angry client.
Development with automatic generation of code – an example
Automatic code generation lets us exploit an advantage coming from being able to compile code before it is run. It allows us to spot mistakes early and lets us avoid embarrassment in front of a client.
A tool for automatic generation of database access code should create objects representing database records as well as classes that allow for easy access to them. A code fragment analogous to the one presented above ( before the change in the model ) will look like this:
Client c = clientDAO.getByPK(1); String address = c.getAddress();
After changing the field "address
" to "address_line_1
", adding a new field "address_line_2
" and regenerating the code accessing the database, all places referencing the field "address
" ( i.e. the second line in the above code sample ) will not compile any more. A developer, even not a particularly experienced one who has just stayed up all night will cope though, because he will get everything served up for him- especially if we take into consideration that contemporary development environments ( e.g. Eclipse ) are going to mark such an error as clearly as if we wrote "thouhg" instead of "though" in a text editor.
Work ergonomics
Writing about development environments one should mention another convenience that significantly increases a programmer's work output: automatic code completion, which is for example available in Eclipse after pressing a combination of keys Ctrl
and Space
. In the last example, it is sufficient for the programmer to type "c.getA
" and activate the code completion functionality and the editor is going to add the missing fragment "ddress()
" on its own. In this case, the immediate gain is not so remarkable, but if we have a column in the database called "care_of_edit_for_one_time_addr
" ( it is not a joke ), retyping its name correctly from the printed database schema at the first attempt is almost impossible.
Implementation
The easiest way to start generating code is finding an existing library that is going to do it for us. Looking for an appropriate tool one should take the following criteria into consideration:
-
is it popular, does it have a broad user base and optional support: forum, discussion groups?
-
is it compatible with the build system that we use – for example ant or maven?
-
is it constantly improved and developed?
Choosing an existing library one has to realize that it is going to be a lengthy acquaintance, almost like a marriage... In such critical areas like O/R mapping it will be necessary to get to know and understand it well because a potential divorce and swapping it for another ( younger ) model is going to be very difficult if not impossible.
Introducing a library for automatic generation of code one has to follow these basic rules:
-
generated code may not be added to a repository ( for example CVS, SVN ) but should be recreated from the sources ( database model, configuration files ) with every build iteration – otherwise we would be dealing with "two sources of truth" that would sooner or later start contradicting each other,
-
generated code may not be manually edited – this should be obvious, as with every new generation of code our changes would be overwritten.
Areas of usage
Code generation should be used everywhere where it is possible and / or appropriate tools are available. A good indication for code generation is existence in the application of a contract written in form of a file. Such a contract could be:
-
a database model ( a contract with the architect of the system ),
-
a data exchange model via WebService written as WSDL ( a contract with an outside system ),
-
configuration files ( a contract with the application's designer ),
-
localization bundles ( contracts with translators ).
Database ( O/R mapping )
This particular application of code generation has been partially described in the introductory example. It is the area where it is used most often because:
-
most of the applications written in Java use databases,
-
a database has lots of tables and columns, so manual creation of access code ( for example using JDBC ) is very time consuming,
-
a database of an application under development is frequently altered,
-
in bigger projects usually one person is responsible for changes in the database model ( an architect ) and another for changes in the application sources ( a programmer ) which can cause misunderstandings and errors when changes are introduced,
-
a database is critical for the functioning of an application, so even a tiny misspelling in its code causes an exception to occur ( a well known
SQLException
) and as a consequence a business error of the application.
Many libraries have been created to take care of communication with database using mapping of database records to objects ( O/R mapping ), but it has to be observed that few of them actually generate Java code. Moreover, if we are considering choosing such a library we have to carefully check the list of supported databases, especially if we already use or want to use a less popular product.
Hibernate
In order to demonstrate that not every O/R mapping library fulfills all the criteria set for a tool for automatic code generation I will start by describing the possibilities of Hibernate, possibly the most widely used O/R mapping library in Java. Hibernate is no doubt a very refined product that is very popular, intensely developed and offers almost perfect customer support. Unfortunately, there is one problem: the central idea while creating Hibernate was to offer mapping from already written classes in Java to database records, not their automatic generation. It is true that generating POJOs ( Plain Old Java Objects – simple objects corresponding to the structure of database records ) is possible using an additional library from the Hibernate "family" called "hibernate-tools", but it is hard not to get an impression that this subject has a low priority. Development of this library is not keeping up with the main library hibernate-core ( which was clearly visible when Java 1.5 was introduced ). Furthermore, Hibernate does not allow its user to create queries based on generated constants corresponding to table and column names.
Below an example of a query retrieving a record using a primary key and records fulfilling two criteria ( name of the street and active status ):
Session session = HibernateUtil.getSessionFactory().getCurrentSession(); Client c = ( Client )session.get( Client.class, 1 ); String address = c.getAddress(); String q = " FROM " + Client.class.getName() + " AS client " + " WHERE client.address = :address AND client.active = :active "; List<Client> result = ( List<Client> ) session.createQuery(q) .setString("address","Marszalkowska") .setBoolean("active", true).list();
As you can see, the problem caused by changing the field "address
" to "address_line_1
" and "address_line_2
" would be only partially recognized by the compiler – i.e. line 3 ( containing the expression c.getAddress()
) would stop compiling ( because such a method would not exist anymore ), but the query stored in the variable q
would still be correct – it would be up to the programmer to find and manually correct all such occurrences.
Torque
Torque is a product that comes closer to fulfilling our expectations: the main goal of its creators was to enable generating code of POJO and DAO ( Data Access Object ) classes – they have a not especially well chosen suffix "Peer
".
Torque consists of two parts: the first one is used to generate objects, the second enables using them in the application. This library allows its users to create queries based on generated constants corresponding to the names of tables and columns which in turn guarantees that changes done to them will cause compilation errors in code, making adjusting to changes easier.
The disadvantage of Torque is the fact that it is a very old library, created originally as part of a bigger framework called Turbine. The quality of its code is not particularly high and some of the patterns used by it are not very comfortable ( for example the use of static methods that form the backbone of the generated Peer
classes which practically makes mocking and unit tests impossible ).
An example of a query retrieving a record using a primary key and records fulfilling two criteria ( name of the street and active status ):
Client c = ClientPeer.retrieveByPK(1); String address = c.getAddress(); Criteria crit = new Criteria(); crit.add( ClientPeer.ADDRESS, "Marszalkowska").and( ClientPeer.ACTIVE, true ); List<Client> result = ClientPeer.doSelect( crit );
As you can see in this example, changes done to the address field would be found by the compiler in all places, i.e. we wouldn't be able to compile line 2 ( c.getAddress()
- because there is no such method anymore ) and line 5 ( ClientPeer.ADDRESS
– because the constant does not exist any more ). If a programmer takes care of the compilation errors, he or she can be sure that the new SQL queries will be correct.
XML
We have a similar situation in the case of XML. Let us assume that we are integrating with some external system and we agree that data is going to be exchanged as XML files. There are many possibilities in which an XML file can be generated, from using System.out.println()
to creating a representation of DOM ( Document Object Model ) objects in memory. Likewise, reading an XML file in can be done in several ways: again, we have the DOM representation at our disposal, the SAX ( Simple API for XML ) interface – especially useful in case of large files that may not fit into available memory etc. Unfortunately, once the contract ( written in a DTD document or a schema ) changes, our application will still compile without a problem and we will not be able to catch a potential error till it actually runs.
In order to protect ourselves from such mistakes we should use a library generating Java code based on the contract. An example of such a library is JAXB ( Java Architecture for XML Binding ). Based on the contract written in a schema file it generates Java files that are later filled with data and transformed into an XML stream.
Part of a definition in the schema file could look like this:
<xsd:element name="clients" type="t:clients_type" /> <xsd:complexType name="clients_type"> <xsd:sequence> <xsd:element name="client" type="t:client_type" minOccurs="0" maxOccurs="unbounded" /> </xsd:sequence> </xsd:complexType> <xsd:complexType name="client_type"> <xsd:attribute name="id" type="t:positive_integer" use="required" /> <xsd:attribute name="firstName" type="t:text255" use="required" /> <xsd:attribute name="lastName" type="t:text255" use="required" /> <xsd:attribute name="status" type="t:statusKlienta" use="required" /> <xsd:attribute name="address" type="t:text255" use="required" /> <xsd:attribute name="birthdate" type="t:dateYYYYMMDD" use="required" /> </xsd:complexType>
Based on such definition the JAXB library will generate POJO objects for us that we can use in the following way:
ClientType client1 = factory.createClientType(); [...] client1.setStatus(ClientStatusEnum.T); client1.setAddress("Marszałkowska"); ClientType client2 = factory.createClientType(); [...] client2.setStatus(ClientStatusEnum.N); client2.setAddress("Al. Niepodległości"); Clients clients = factory.createClients(); clients.getClient().add(client1); clients.getClient().add(client2); FileOutputStream out = new FileOutputStream(new File("/tmp/clients.xml")); marshaller.marshal(clients, out);
Once the programm has finished running, we will find the following in the clients.xml file:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <clients> <client address="Marszałkowska" birthdate="20090730" id="1" firstName="Marek" lastName="Abacki" status="T" /> <client address="Al. Niepodległości" birthdate="20090730" id="2" firstName="Krzysztof" lastName="Babacki" status="N" /> </clients>
As we can see in this example, the generated POJO objects will make sure for us that the file structure is correct, that argument types agree ( for example date format ), and even that enum types are correct ( class ClientStatusEnum
contains constants from the correct namespace ). It can be easily worked out that changing the field "address
" to "address_line_1
" and "address_line_2
" in the definition of the client in the schema file is going to quickly cause a compilation error.
Configuration files
Most frameworks and applications use configuration files – as an example we can take Struts with its struts-config.xml file containing forms definitions. A fragment of such a file could look something like this:
<form-bean name="clientForm" type="org.apache.struts.action.DynaActionForm"> <form-property name="firstName" type="java.lang.String" /> <form-property name="lastName" type="java.lang.String" /> <form-property name="address" type="java.lang.String" /> <form-property name="active" type="java.lang.Boolean" /> </form-bean>
Standard code for accessing data from such a form would be similar to this:
Client client = new Client(); client.setFirstName(form.getString("firstName")); client.setLastName(form.getString("lastName")); client.setAddress(form.getString("address")); client.setActive((Boolean)form.get("active"));
As we can see in this example, we are referencing the names of the fields in the form defined in the XML file typing the strings manually. A change in the form definition ( for example changing the address field to "address_line_1
" is not going to cause a compilation error, but the whole application is going to stop working.
Using a simple code generation tool we can create interfaces ( based on the XML configuration file ) containing constants corresponding to the names of the fields in the form. As a result, the code for accessing data will now look like this:
Client client = new Client(); client.setFirstName(form.getString(clientFormC.firstName)); client.setLastName(form.getString(clientFormC.lastName)); client.setAddress(form.getString(clientFormC.address)); client.setActive((Boolean)form.get(clientFormC.active));
Once we carry out the changes in the XML definition file and rebuild the application the code above will fail to compile, forcing the programmer to correct the errors. Furthermore, it will be much easier to find all references to the same element: let us assume that we are looking for all occurences of the field "address
" in the client form. If we are going to carry out a simple textual search of a string "address
" we are going to find references to an address field in a contractor form, in a supplier form and also many other accidental references that have nothing to do with forms. However, if we are going to search for all occurences of a constant clientFormC.address
we are going to get precisely what we have been looking for.
So far I have not seen an existing library that would generate such constants for the Struts framework, moreover each framework and application have their own formats of the configuration files, so it is probably better to write your own generators. First time around writing such a generator ( in form of an Ant task or Maven plugin ) will most likely take no more than a day, each consecutive one will be a matter of hours.
Localization files
Next scenario in which generation of code could come useful are keys of localization files. Let us assume we have a file called validations.properties
containing the following lines:
client_form.address.required=Field "Address" is mandatory client_form.address.maxlength=Field "Address" may not be longer than {0}.
In the application code validating entries in the form we have following references to the localization keys:
String address = form.getString("address"); if("".equals(address.trim())) { errors.add("validations", "client_form.address.required"); } if(address.trim().length() > 255) { errors.add("validations", "client_form.address.maxlength", 255); }
If someone removes the above mentioned entries in the localization file or changes their keys, the application is going to generate an error instead of an appropriate message for the client.
Using a simple generator that produces interfaces with constants based on the properties file will allow us to alter the above code in a following way:
String address = form.getString("address"); if("".equals(address.trim())) { errors.add(validationsC.client_form.address.required); } if(address.trim().length() > 255) { errors.add(validationsC.client_form.address.maxlength, 255); }
Now removing an entry in a localization file will cause compilation to fail and an IDE is going to show us keys that we are referencing in our code that no longer exist in the file. This solution is also going to make refactoring easier, for example the previously mentioned changes of the address field to "address_line_1
" and "addres_line_2
".
I have not seen an existing library that would generate such constants yet, but – as in the case of the configuration files – writing them from scratch as an Ant task or Maven plugin is not going to take more than a day.
Summary
Code generation significantly eases the writing and maintaining of an application, such code is also invaluable in case of bigger refactorings or searching code for many references to a single element.
I encourage everyone to have a look at their own application to see whether any string constants are present in the code, taking a moment to reflect upon what they represent and whether they should not be replaced with generated constants. Ideally, an application should contain only references to generated constants – without having to define them manually in the code.
Advantages
-
it is not necessary to write obvious and repeatable code ( for example POJO objects representing database records );
-
consistent code and repeatable patterns;
-
errors discoverable in the compilation phase;
-
using auto-completion functions of an IDE (
Ctrl-Space
in Eclipse ); -
easier refactorings;
-
searching for many references to the same element is made easier (
Ctrl-Shift-G
in Eclipse );
Disadvantages
-
dependency on additional external libraries, necessity to update dependencies, possibility of error occurences;
-
introducing a new developer to the project is more difficult;
-
additional complexity of the build process of an application;
-
having to create own generators for particular elements ( for example configuration files );
Nobody has commented it yet.