Machine Learning with Deeplearning4j and Eclipse Scout
Machine learning and deep learning in particular is developing at amazing speeds. Today, machine learning can be used to solve ever more complex tasks that have been considered impractical just a few years ago. Examples include autonomous cars, AlphaGo’s win against the world’s Go champion, photo realistic transformation of pictures or neural machine translation systems.
In this blog post we describe a simple system to recognize monetary amounts on Swiss payment slips. The user interface is implemented using Eclipse Scout and we build, train and run the deep neural net using Deeplearning4j.
The screenshot above shows an image of the scanned payment slip in the upper part. In the lower part of the form the output of the neural network is shown. The form shown above is implemented in class HcrForm.
Although all handwritten numerals are correctly recognized, the network assigns a low confidence score to the numeral six. This is indicated by the orange background which will prompt the operator to manually check the result and (if necessary) correct the output of the neural network in the user interface.
The open source framework Eclipse Scout has been specifically built for enterprise applications with the following goals in mind.
- Enterprise user deserve simple and powerful user interfaces.
- Implementing and maintaining business applications must be efficient.
- Business applications should be independent of specific technologies.
- Learning the framework should be painless.
Scout may be used for any type of business applications such as ERP, CRM or medical data storage systems. As shown with the demo application described in this blog post innovative technologies such as machine learning are straight forward to integrate with Scout applications.
The framework has been proven in production for over a decade is currently based on Java and HTML5. Since 2010, the Scout Open Source project is hosted by the Eclipse foundation.
The latest Scout release will be shipped as part of the Eclipse Oxygen release train on June 28 2017.
DeepLearning4J is a toolkit for building, training and deploying Neural Networks. Currently it is the most complete and mature deep learning library in the Java domain. The deeplearning4j library comes with a good documentation and is easy to integrate into Scout applications.
For the example application described in this blog post it is enough to add some Maven dependencies.
As machine learning is always about models that you train on some data and then apply on some other data we want to illustrate these steps using the Deeplearning4j library. Let’s start by constructing a new multi-layer network like the class NeuralNetwork of the Anagnostes demo application.
For now we skip the description of the network configuration object, as this is covered in more detail in the Network Architecture section below. We can then train this neural network model as follows.
The above method trains the neural network over several epochs (an epoch corresponds to cycling through the complete training data once). In each epoch the networks parameters are updated to improve the networks performance on the training data with the line m_network.fit(trainData). To verify the performance with data not seen during training the network is evaluated after each epoch using separate validation data.
The trained model can then be used to classify new data. In our demo application we want to recognize handwritten numerals. The code below takes an image as input and transforms the normalized image into an input vector for the network using Nd4j.create(normalizedImage). The network then classifies this input with the statement m_network.output(input) by assigning confidence values to each numeral class ‘0’, ‘1’ … ‘9’. The confidence value for class ‘4’ can then be accessed with output.getDouble(4).
Good data is always of central importance whenever we apply machine learning to a specific domain. For the sake of simplicity and comparability we decided to go for the best known task in the domain of machine learning: the classification of handwritten numerals. By far the most frequently used data collection is called the MNIST database. It contains roughly 60,000 images of numerals to train systems and 10,000 numerals to test systems.
The individual numerals in the MNIST database are normalized to 28 by 28 pixels of gray-level images. The picture above provides some examples.
For our demo application we also wanted to experiment with our own data in addition to publicly available MNIST data. For the data collection we asked people to fill in a simple form with their everyday writing style. See below for a picture of such a collection form.
In a simple semi-manual process the scanned form is then converted into individual image files holding a single isolated numeral. In contrast to the MNIST data the images of our numbers database are normalized for training and testing at runtime. For our experiments we now have 10,000 digit images written by 20 individuals. As in the case of MNIST our data is publicly available. In contrast to the MNIST data our scanned images are available in their original format (color or grayscale, whatever we received as contributions).
Side note: Please consider to contribute to this collection! Our next goal is to reach 20,000 images. We gladly accept pull requests containing at least the scan of your filled in form (using the template).
Before we can use the images of our handwritten numerals for training and/or recognition we perform an image normalization step that converts the scanned numeral into the 28 by 28 gray-level pixel format used by the MNIST database. This normalization step is illustrated below.
This normalization has the advantage that we can work with existing network architectures that have been extensively tested by the machine learning community and at the same time it allows us to use the existing MNIST data to amend our own data collection.
To match the MNIST images format the normalization process consists of the following steps:
- Binarize the color or gray-level image. This results in a black and white image.
- Resize the cropped numeral to a 20 by 20 pixel box while preserving the aspect ratio.
- Calculate the center of gravity for the resized image
- Center the image in a 28 by 28 pixel frame using the center of gravity calculated above.
For the neural network architecture we use a convolutional neural network very similar to the one proposed by Yann Le Cun et al in 1998. This architecture is illustrated in the diagram below, taken from Le Cun’s publication.
The architecture can be divided into a feature extraction stage (convolutional and subsampling layers) and a classification stage (the fully connected layers at the right end). The planes in the convolutional layers implement different filters that are applied to the input image. By applying subsampling and adding more convolutional layers the network is capable to learn a set of filter combinations that prove to be highly effective for image classification. To learn more about convolutional network architectures check out Denny Britz’s blog post.
The classification stage correspond to the classical neural network architecture that have been around for over 30 years. Any neural network tutorial covering multilayer perceptrons will do to learn more.
Based on the diagram for the network architecture in our demo application below it should become clear that this implementation is very close to the LeNet architecture proposed in 1998.
This might seem somewhat intimidating at first sight. But then again, this corresponds to the result of years of research. Luckily, the Deeplearning4j library comes with an extensive set of examples that provide valuable starting points for many different machine learning use cases.
This blog posts describes a simple demo application to recognize numeral amounts on payment slips. The application has a user interface part implemented with the Eclipse Scout framework and a machine learning part implementedusing the Deeplearning4j library.
Dealing with a task for which is sufficient to work with only 6 layers roughly corresponds to a deep learning “Hello World” exercise. At the same time, the described use case covers many of the recurring topics for machine learning problems. For many more complex problems it is not unusual to work with dozens or even over hundred layers as in the case of the ImageNet challenges.
In our experience integrating Deeplearning4j with Eclipse Scout applications proved to be straight forward. In case you’d like to play around with the demo application clone the Anagnostes repository and import the project as an existing Maven project in your Eclipse IDE (please use the Scout package as described on the Scout homepage).