Thursday, October 24, 2013

Java Cryptography Architecture and common encryption usages

JCA can be really tricky to perform simple, common tasks. After latest usage of java cryptography I'd like to present below the simplest usage of cryptography methods, involving mainly symmetric (AES) and asymmetric (RSA/DSA) encryption plus some helper methods.

I don't use specific JCA provider, but let it choose appropriate one. If you want to select specific provider, you must provide additional parameters, usually to getInstance() methods of various algorithm elements. In examples I use RSA1024 (you might change it to DSA) and AES256.

Note about AES256 - to enable this for your VM you need Java Cryptography Extension (JCE) Unlimited Strength Jurisdiction Policy Files. With default policy you can only use AES128.

The amount of "things can go wrong" is pretty big, so if you write your own cryptography util, there's a good thing to wrap JCA exception in something that can be easy caught in the user code.

public class EncryptionException extends Exception {
 
 public EncryptionException() {
 }
 
 public EncryptionException(String message) {
  super(message);
 }
 
 public EncryptionException(String message, Throwable cause) {
  super(message, cause);
 }
 
 public EncryptionException(Throwable cause) {
  super(cause);
 }
 
}

Firstly a simple thing, BASE64 encoding wrappers (to be able to switch implementation, what of course will be never used):

public static String encodeBase64(byte[] data) {
 return new BASE64Encoder().encode(data);
}
 
public static byte[] decodeBase64(String data) throws EncryptionException {
 try {
  return new BASE64Decoder().decodeBuffer(data);
 } catch (IOException e) {
  throw new EncryptionException("Error decoding base64", e);
 }
}

How to generate RSA 1024 keys with SecureRandom:

/**
 * Generates new keypair
 */
public static KeyPair generateKeys() throws EncryptionException {
 try {
  KeyPairGenerator keyGen = KeyPairGenerator.getInstance("RSA");
  SecureRandom random = SecureRandom.getInstance("SHA1PRNG");
  keyGen.initialize(1024, random);
  return keyGen.generateKeyPair();
 } catch (Exception e) {
  throw new EncryptionException("Error generating keypair", e);
 }
}

How to encrypt with asymmetric key (public or private):

/**
 * Encrypts data with key
 */
public static byte[] encryptAsymmetric(Key key, byte[] data)
throws EncryptionException {
 try {
  Cipher cipher = Cipher.getInstance("RSA");
  cipher.init(Cipher.ENCRYPT_MODE, key);
  return cipher.doFinal(data);
 } catch (Exception e) {
  throw new EncryptionException("Error key encrypting", e);
 }
}

For above, how to decrypt with asymmetric key (public or private, the different one than used for encryption):

/**
 * Decrypts data with key
 */
public static byte[] decryptAsymmetric(Key key, byte[] data)
throws EncryptionException {
 try {
  Cipher cipher = Cipher.getInstance("RSA");
  cipher.init(Cipher.DECRYPT_MODE, key);
  return cipher.doFinal(data);
 } catch (Exception e) {
  throw new EncryptionException("Error key decrypting", e);
 }
}

How to build symmetric key for AES256 encryption:

/**
 * Builds a random secret key for symmetric algorithm
 */
public static Key buildSymmetricKey() throws EncryptionException {
 try {
  KeyGenerator keyGen = KeyGenerator.getInstance("AES");
  keyGen.init(256, SecureRandom.getInstance("SHA1PRNG"));
  return keyGen.generateKey();
 } catch (Exception e) {
  throw new EncryptionException("Error generating secret key", e);
 }
}

The building of the AES key based on user password is a little tricky. You need to have a salt (N-bytes array) that is used to generate a proper AES key (with a proper length). If you need to recover this key in the future, using just the same user password, you need to use exactly the same salt, so it's probably best to hardcode it somewhere and use for further keys generation (random 8 bytes):

private static byte[] SALT = new byte[]{
 (byte) 0xa1, (byte) 0x22, (byte) 0x33, (byte) 0xa4,
 (byte) 0x11, (byte) 0x22, (byte) 0x12, (byte) 0x22};
 
/**
 * Builds a secret key for symmetric algorithm recoverable by password
 */
public static Key buildSymmetricKey(String password)
throws EncryptionException {
 try {
  SecretKeyFactory factory =
   SecretKeyFactory.getInstance("PBKDF2WithHmacSHA1");
  KeySpec spec = new PBEKeySpec(password.toCharArray(), SALT, 256,
   256);
  SecretKey tmp = factory.generateSecret(spec);
  return new SecretKeySpec(tmp.getEncoded(), "AES");
 } catch (Exception e) {
  throw new EncryptionException("Error encoding secret key", e);
 }
}

Now, how to encrypt the arbitrary length data block with AES:

/**
 * Encrypts data with symmetric algorithm and password
 */
public static byte[] encryptSymmetric(Key key, byte[] data)
throws EncryptionException {
 try {
  Cipher cipher = Cipher.getInstance("AES/ECB/PKCS5Padding");
  cipher.init(Cipher.ENCRYPT_MODE, key);
  return cipher.doFinal(data);
 } catch (Exception e) {
  throw new EncryptionException("Error symmetric encrypting", e);
 }
}

Note, that I use ECB as block cipher mode of operation. This is not the safest one, better would be CBC (refer wikipedia). But if you need to recover your data only by AES key, you can't do this. To recover from CBC you need to store your CBC initialization vector together with the password. So, I use ECB for this simple example, to make all it working only with keys.

The decryption for above symmetric encryption is similar:

/**
 * Decrypts data with symmetric algorithm and password
 */
public static byte[] decryptSymmetric(Key key, byte[] data)
throws EncryptionException {
 try {
  Cipher cipher = Cipher.getInstance("AES/ECB/PKCS5Padding");
  cipher.init(Cipher.DECRYPT_MODE, key);
  return cipher.doFinal(data);
 } catch (Exception e) {
  throw new EncryptionException("Error symmetric descrypting", e);
 }
}

This is all about working encryption/decryption. Now about storing the keys in db. If you'd like to use BASE64 encoding or even to hold everything in BLOB-s, methods below will be useful.

Converting the key to BASE64 is easy:

/**
 * Converts key to base64 encoded string
 */
public static String keyToString(Key key) {
 return encodeBase64(key.getEncoded());
}

But recovering the key from BASE64 or byte[] is another tricky part:

/**
 * Converts base64 encoded string to assymetric key
 *
 * @param publicKey if true returns public key, private key otherwise
 */
public static Key asymmetricKeyFromString(String s, boolean publicKey)
throws EncryptionException {
 return asymmetricKeyFromBytes(decodeBase64(s), publicKey);
}
 
/**
 * Converts bytes to assymetric key
 *
 * @param publicKey if true returns public key, private key otherwise
 */
public static Key asymmetricKeyFromBytes(byte[] bytes, boolean publicKey)
throws EncryptionException {
 try {
  if (publicKey) {
   return KeyFactory.getInstance("RSA").generatePublic(
    new X509EncodedKeySpec(bytes));
  } else {
   return KeyFactory.getInstance("RSA").generatePrivate(
    new PKCS8EncodedKeySpec(bytes));
  }
 } catch (Exception e) {
  throw new EncryptionException("Can't decode assymetric key", e);
 }
}

The same for symmetric key looks much easier:

/**
 * Converts base64 encoded string to symmetric key
 */
public static Key symmetricKeyFromString(String s) throws
EncryptionException {
 return symmetricKeyFromBytes(decodeBase64(s));
}
 
/**
 * Converts bytes to symmetric key
 */
public static Key symmetricKeyFromBytes(byte[] bytes)
throws EncryptionException {
 return new SecretKeySpec(bytes, "AES");
}

Sometimes you also need to convert your private/public keys to PEM format for exporting. Without bouncycastle you need your own method for this:

public static String getPem(Key key) {
 StringBuilder sb = new StringBuilder();
 if (key instanceof PrivateKey || key instanceof PublicKey)
  sb.append(String.format("-----BEGIN %s %s KEY-----\n", "RSA",
   key instanceof PublicKey ? "PUBLIC" : "PRIVATE"));
 else
  sb.append("-----BEGIN KEY-----");
 sb.append(encodeBase64(key.getEncoded()));
 if (key instanceof PrivateKey || key instanceof PublicKey)
  sb.append(String.format("\n-----END %s %s KEY-----", "RSA",
   key instanceof PublicKey ? "PUBLIC" : "PRIVATE"));
 else
  sb.append("\n-----END KEY-----");
 return sb.toString();
}

And this was just last example of simple Java cryptography API.

Friday, October 18, 2013

Convert document to HTML with Apache Tika

Apache Tika has a wonderful feature, that can transform source document (PDF, MSOffice, Open Office etc.) into HTML during content extraction. Sound pretty simple, but I've dug through a lot of google search results and I can't find a simple working example anywhere.

But, here is a working snippet I extracted from tika-app:

ByteArrayOutputStream out = new ByteArrayOutputStream();
SAXTransformerFactory factory = (SAXTransformerFactory)
 SAXTransformerFactory.newInstance();
TransformerHandler handler = factory.newTransformerHandler();
handler.getTransformer().setOutputProperty(OutputKeys.METHOD, "html");
handler.getTransformer().setOutputProperty(OutputKeys.INDENT, "yes");
handler.getTransformer().setOutputProperty(OutputKeys.ENCODING, "UTF-8");
handler.setResult(new StreamResult(out));
ExpandedTitleContentHandler handler1 = new ExpandedTitleContentHandler(handler);
 
tikaParser.parse(new ByteArrayInputStream(file), handler1, new Metadata());
return new String(out.toByteArray(), "UTF-8");
 
It works pretty nicely. Here is an example of original MSOffice document:

And here how the above looks in my webapp as HTML preview:


Wednesday, October 16, 2013

Play Framework on Heroku and custom dependencies

Today I was playing a little about Play Framework. This is very nice lightweight application framework for Java and Scala. I was trying to make an app and deploy to Heroku, which is a PaaS platform where you can host you Play applications for free (with some limitations of course).

And here the problem happened. For the project I use my own lib, managed by maven, let's say:

<groupId>com.blogspot.lifeinide</groupId>
<artifactId>mylib</artifactId>
<version>1.0-SNAPSHOT</version>
 
Now, how to use this lib for local Play project and then deploy it to Heroku with this dependency too? I don't want to put each time the new version to Play lib/ folder (it can be anyway cleaned up by "play dependencies") because this is still under development too. How to automate this? It is a bit tricky.

Play Framework uses its own dependency resolving mechanism based on dependencies.yml file, but internally it uses maven too. The key here is to configure appropriately dependencies.yml to use different maven repository, for example:

require:
    - play 1.2.5.3
    - com.blogspot.lifeinide -> mylib 1.0-SNAPSHOT
 
repositories:
    - jboss:
        type: iBiblio
        root: "file://${user.home}/.m2/repository/"
        contains:
            - com.blogspot.lifeinide -> *
 

Now you can "mvn install" your project to your local maven repository, and Play dependency system can find it.

But Heroku can't...

But here is the trick for Heroku. In my heroku project I have following structure:

|- heroku    // my heroku project
  |- .git    // git repository for heroku deployment
  |- repo    // this is my local maven repository

Now for the mylib project pom.xml I can configure appropriate deployment:

<build>
  <plugins>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
          <artifactId>maven-deploy-plugin</artifactId>
          <version>2.7</version>
        </plugin>
    </plugins>
</build>
 
<distributionManagement>
  <repository>
    <id>local-lib-unmanaged</id>
    <name>local-lib-unmanaged</name>
    <url>file:///path/to/heroku/repo</url>
  </repository>
</distributionManagement>

So it basically deploys the artifact to my local "repo" repository under heroku project on "mvn deploy". Now the changes in dependencies.yml:

require:
    - play 1.2.5.3
    - com.blogspot.lifeinide -> mylib 1.0-SNAPSHOT
 
repositories:
    - jboss:
        type: iBiblio
        root: "file://${application.path}/repo/"
        contains:
            - com.blogspot.lifeinide -> *

And here you are. After adding the "repo" to git and pushing all this stuff to Heroku, it can resolve dependencies.

Monday, October 7, 2013

SOA vs. Domain design and OpenSessionInThreadExecution pattern

This article describes some complementation for Open Session In View pattern for Java/Spring. But first I want to show just my point of view about the pattern itself.

Many people consider this as an anti-pattern and tell that it never should be used. I think exactly opposite, and here are some explanations.

I code java (JEE) almost 10 years and in general after this experience I can tell that there are two architectures for Java webapps which I worked in (and many variations of course): service oriented architecture (or SOA/service design) and domain driven design (DDD).

Service oriented architecure is a perflectly layered application, with each layer with its own, and only one responsibility: we mostly have views, controllers, services and daos. A lot of people encourage this design, as the only valid one. It indeed looks very good on the paper and in diagrams - this is the cleanest one. But here are my experiences in working with such applications.

I'm talking about standard design with such design guidelines (the most common):
  • DAOs handle with persistence layer using Hibernate/JPA as the low level DB access API
  • services provide business interfaces, and uses DAOs internally
  • the usual session and transaction model is session-per-transaction, and transactions are established  by AOP, when service methods are invoked
Second thing is that I'm talking about standard small-medium size webapps, they can be web portals or business application, they can be clustered using eg. standard tomcat clustering with load balancer. But I'm not talking about very big systems eg. for banking with 30 developers team and 2 mln of code lines, working on the server farm and integrating inside 15 standalone systems for various purposes. For such requirements I think the SOA is the right choice.

Now, what is wrong with this design for me? These projects, where we used such design were unnecessarily HUGE. The development team had to be much bigger than in DDD, the simple modifications to the application took a lot of hours and people spent 80% of time for doing mokey jobs, like moving data from one type of objects to another or struggling with LazyInitializationException. Sincerely, if I had chosen this design few years ago when I created my own application, I would have been out of the market already. I already know people who give up Java, and switched to different techs (like Python or Ruby) because "Java became the technology where you cannot do anything efficiently". I believe this is not true, this is only the design problem.

Where does all this bloat come from? The main problem here is exactly the hibernate session management design. I consider that almost 90% of this bloat code come from this way of handling with Hibernate. I think all this design was created when there was no sensible ORM yet, but when it appeared, people just don't understand how it could help to create applications really rapidly, using all benefits of ORM. And to create them with 20% of original team, and to have the product that really quickly can answer for the constantly changing requirements from the market.

So, what are my problems with SOA? Here are they:
  1. Hibernate gives you the database abstraction, which is the implementation of DAO pattern itself. I cannot undestand why still to write yet another DAO layer, doing the same things as Hibernate already does (the DAO over DAO). If you resign from this, you have a one pretty big layer less, what reduce your code significantly. And still the DAO is preserved - you can always switch to another database implementation changing the Hibernate engine (this is what the DAO is for).
  2. If you spent a lot of time to design your beautiful domain model, why to give up using these objects in your whole application, and to develop another structure of objects? This problem comes from the session management design. If you close you session after leaving service layer, you can't rely on your domain objects anymore. You need to develop another structure of DTO objects and use DTO in higher layers. You need to develop also a lot of conversion code as well, which tosses the data between the Domain and DTO objects (in both ways). You finally end up with hundred of tons of code, two objects model hierarchies that looks almost exactly the same (Domain and DTO), that might just not be right there, if another model of session management had been choosen.
  3. Another problem that if you write the standard web application, I mean the app which has the HTML user interface etc. (not pure service application, like web-services app only),  the service layer becomes quickly very artifical and cluttered. For example let's say that I have the BlogEntry entity with collections of Comment. What is problem with this? If you have eg. service method:
    BlogEntryDTO getBlogEntry(long id) {..}
    What version of DTO object you should return from such service - with filled lazy comments collection or not? There is some client code that needs these comments, and others not. So, should you always load comments with another SQL, even for the BlogEntry entity usage for the client code which doesn't need them? Hmm... no, and you finally end up with:
    BlogEntryDTO getBlogEntry(long id) {..} 
    BlogEntryDTO getBlogEntryWithComments(long id) {..}
    And with another dozen of getBlogEntry() methods in a service, plus fifty other services. Finally to do a simple thing you need to dig through hundreds of code lines, to find what you exactly need (or write another getBlogEntry() method if you can't).
  4. Another example is getting collections from services. Let's consider a simple one:
    List<BlogEntryDTO> getUserBlogEntries(long userid) {..}
    Looks nice and clean... but, wait a moment. We also needs the count(*) calculation and limits for pagination, and the sorting capability as well. How many method do we need to write more to achieve this? It's crazy.
And how does it look in the DDD design?
  1. You don't have to write the DAO layer. You may use your Hibernate DAO. This doesn't prevent you from having nice and well designed DAO layer. What you exactly need is the Session API and @NamedQueries for fetching items by queries.
  2. Your service layer doesn't close the Session after each method invocation, but it's held opened to the end of the request. You don't have to write this unnecessary DTO and conversion code. This doesn't prevent you from having nice and well designed service layer.
  3. You don't bother about this. If you client code decides that it needs comments, it invokes getComments() and have them on demand, because session is still opened.
  4. You also don't bother about this. You can use Query or Criteria objects to transfer collections from the service layer, and you can parametrize them later, depending on what your controller layer wants to achieve.
For the people who advise anyway to use SOA, I can tell that I know that DDD is not a pure layered solution. But greater benefits for me is to have an application that have 30% of original (SOA) lines of code (and all benefits from this - time and cost of development, time and cost of modification, testing etc), than have a big bloat, but "pure layered" medal on my chest.

Of course, every pot has two handles. Now few words about when the SOA might be better than DDD in my opinion. It's when you have a big part of application, where you don't handle with web requests (like standard user request, or webservice request), but the work is done in internal threads (eg. schedulers, ESB listeners etc). DDD is highly bound to open-session-in-view pattern, usualy done by OpenSessionInViewFilter (eg. from the Spring) and the holding session opened just don't work for internal threads. So in DDD you can imagine that you may have inconsistent design, because you need to handle with service layer in different way from request-bound threads and from internal threads. While in SOA, if you already developed all this stuff required to implement this pattern, everything is handled in the same way from both kinds of threads. Well, you're right, but...

I have at least two solutions for this problem. The internal mechanisms usually use just some subset of your total business logic (like receiving tasks from some server and putting them to the db periodically). You can easily create small and thin layer of services, dedicated to work with internal threads, and expose the subset of logic that they need, in these services. Your internal services can be designed in standard SOA pattern, ie. to have session per transaction and AOP-ly defined transaction boundaries on methods. Your scheduler or whatever else can always start the work from some service method invocation, that does its job.

Another solution is my OpenSessionInThreadExecution pattern, which is just the remake of OpenSessionOnViewFilter from Spring. It assumes, that you always have some entry point in some background tasks, where you start doing you background work (like Runnable.run()) - and this is perflectly true because somewhere you join to the internal thread with your logic. Using this bean you can simply wrap your execution with Runnable, and pass to the execute() method, to be executed with opened session throughout whole execution. Here is the code:


package com.blogspot.lifeinide;
 
import org.apache.log4j.Logger;
import org.hibernate.FlushMode;
import org.hibernate.Session;
import org.hibernate.SessionFactory;
import org.springframework.beans.factory.InitializingBean;
import org.springframework.beans.factory.annotation.Required;
import org.springframework.dao.DataAccessResourceFailureException;
import org.springframework.orm.hibernate3.SessionFactoryUtils;
import org.springframework.orm.hibernate3.SessionHolder;
import org.springframework.transaction.support.TransactionSynchronizationManager;
 
/**
 * This is a util class to keep session opened during whole single thread  
 * execution (like scheduler or mule service thread), works in the same manner 
 * as {@link OpenSessionInViewFilter}.
 */
public class OpenSessionInThreadExecution implements InitializingBean {
 
  protected static final Logger logger = 
    Logger.getLogger(OpenSessionInThreadExecution.class);
  protected static OpenSessionInThreadExecution instance = null;
 
  protected boolean singleSession = true;
  protected FlushMode flushMode = FlushMode.MANUAL;
  protected SessionFactory sessionFactory;
 
  @Override
  public void afterPropertiesSet() throws Exception {
    if (instance == null)
      instance = this;
    else
      throw new IllegalStateException(
        "OpenSessionInThreadExecution should be defined as " +
        "the only one singleton bean.");
  }
 
  public SessionFactory getSessionFactory() {
    return sessionFactory;
  }
 
  @Required
  public void setSessionFactory(SessionFactory sessionFactory) {
    this.sessionFactory = sessionFactory;
  }
 
  /**
   * Set whether to use a single session for each request. Default is "true".
   * <p>If set to "false", each data access operation or transaction will use
   * its own session (like without Open Session in View). Each of those
   * sessions will be registered for deferred close, though, actually
   * processed at request completion.
   *
   * @see SessionFactoryUtils#initDeferredClose
   * @see SessionFactoryUtils#processDeferredClose
   */
  public void setSingleSession(boolean singleSession) {
    this.singleSession = singleSession;
  }
 
  /**
   * Return whether to use a single session for each request.
   */
  protected boolean isSingleSession() {
    return this.singleSession;
  }
 
  /**
   * Specify the Hibernate FlushMode to apply to this filter's
   * {@link org.hibernate.Session}. Only applied in single session mode.
   * <p>Can be populated with the corresponding constant name in XML bean
   * definitions: e.g. "AUTO".
   * <p>The default is "MANUAL". Specify "AUTO" if you intend to use
   * this filter without service layer transactions.
   *
   * @see org.hibernate.Session#setFlushMode
   * @see org.hibernate.FlushMode#MANUAL
   * @see org.hibernate.FlushMode#AUTO
   */
  public void setFlushMode(FlushMode flushMode) {
    this.flushMode = flushMode;
  }
 
  /**
   * Return the Hibernate FlushMode that this filter applies to its
   * {@link org.hibernate.Session} (in single session mode).
   */
  protected FlushMode getFlushMode() {
    return this.flushMode;
  }
 
  public void execute(Runnable command) {
    boolean participate = false;
 
    if (isSingleSession()) {
      // single session mode
      if (TransactionSynchronizationManager.hasResource(sessionFactory)) {
        // Do not modify the Session: just set the participate flag.
        participate = true;
      } else {
        logger.debug("Opening single Hibernate Session in " +
          "OpenSessionInViewFilter");
        Session session = getSession(sessionFactory);
        TransactionSynchronizationManager.bindResource(
          sessionFactory, new SessionHolder(session));
      }
    } else {
      // deferred close mode
      if (SessionFactoryUtils.isDeferredCloseActive(sessionFactory)) {
        // Do not modify deferred close: just set the participate flag.
        participate = true;
      } else {
        SessionFactoryUtils.initDeferredClose(sessionFactory);
      }
    }
 
    try {
      command.run();
    } finally {
      if (!participate) {
        if (isSingleSession()) {
          // single session mode
          SessionHolder sessionHolder =
            (SessionHolder) TransactionSynchronizationManager.
              unbindResource(sessionFactory);
          logger.debug("Closing single Hibernate Session in " +
            "OpenSessionInViewFilter");
          closeSession(sessionHolder.getSession(), sessionFactory);
        } else {
          // deferred close mode
          SessionFactoryUtils.processDeferredClose(sessionFactory);
        }
      }
    }
  }
 
  /**
   * Get a Session for the SessionFactory that this filter uses.
   * Note that this just applies in single session mode!
   * <p>The default implementation delegates to the
   * <code>SessionFactoryUtils.getSession</code> method and
   * sets the <code>Session</code>'s flush mode to "MANUAL".
   * <p>Can be overridden in subclasses for creating a Session with a
   * custom entity interceptor or JDBC exception translator.
   *
   * @param sessionFactory the SessionFactory that this filter uses
   * @return the Session to use
   * @throws DataAccessResourceFailureException
   *          if the Session could not be created
   * @see org.springframework.orm.hibernate3.SessionFactoryUtils#
   *   getSession(SessionFactory, boolean)
   * @see org.hibernate.FlushMode#MANUAL
   */
  protected Session getSession(SessionFactory sessionFactory) 
  throws DataAccessResourceFailureException {
    Session session = SessionFactoryUtils.getSession(sessionFactory, true);
    FlushMode flushMode = getFlushMode();
    if (flushMode != null) {
      session.setFlushMode(flushMode);
    }
    return session;
  }
 
  /**
   * Close the given Session.
   * Note that this just applies in single session mode!
   * <p>Can be overridden in subclasses, e.g. for flushing the Session before
   * closing it. See class-level javadoc for a discussion of flush handling.
   * Note that you should also override getSession accordingly, to set
   * the flush mode to something else than NEVER.
   *
   * @param session        the Session used for filtering
   * @param sessionFactory the SessionFactory that this filter uses
   */
  protected void closeSession(Session session, SessionFactory sessionFactory) {
    SessionFactoryUtils.closeSession(session);
  }
 
  public static OpenSessionInThreadExecution getInstance() {
    return instance;
  }
 
}