At work, I and the other software developers work primarily with the Java programming language. Part of our organization's goal involves algorithms and scientific data analysis of data sets, which is researched by another team. Traditionally there has been a lot of data analysis and scientific work with Matlab, but since the team has switched to Python. Python appears to have a strong scientific community and tools (such as pythonxy) for rapid development for scientific computing, data analysis, and data visualization. Since then, I have looked at ways where we can collaborate by running their algorithms in the JVM, without the need for costly and error-prone porting to Java. Ideally I'd like a way for the Python code to leverage a codebase developed over 8 years for our problem domain, and for Java code to leverage new work being done in Python.
Read on for my current progress...
I have been watching Jython (formerly JPython) for quite some time, but most of the popular scientific/visualization libraries (like numpy and others) are really wrappers around C code and use the extensions specific for CPython, which Jython doesn't support. There is a desire for Jython to support the C extensions API but it hasn't happened yet. If it did support it, then I think Jython would be the ideal solution, even if does update slower than CPython. I saw JPype but it looks more like calling Java from Python, which is not quite what we want. Then I found JEPP, which is more like calling python from Java.
However, my experience with JEPP was horrible and non-working at all, at least on Windows x86-32. The initial warning sign was the lack of updates for awhile and the combination of the files being named with each permutation of java version and python version; it sounded like the PHP/Apache madness of matching 32/64 bit, Apache/PHP version, and C runtime version. It's like the native precompiled binary wheel of fortune where you spin it and just hope you get a set of compatible libraries. Here were the list of problems I ran into before I gave up:
After that experience, I didn't try to go any farther, even if I got it to work I would be stuck forever on an old Python and there would be many more C extensions to worry about.
In the end I am thinking it is not currently possible to run typical scientific Python code in the same process as JVM langauges. So the fallback would be either porting code, or running JVM and Python in two separate processes and using whatever IPC mechanism is the easiest (pipes, TCP, XML, etc.) with a very loosely bound interface. However, this makes it much more challenging (and much higher overhead) to build a single, integrated product and eliminates any ability for rapidly leveraging library code in JVM or Python for prototyping. I just heard about execnet, that might make the IPC easier, though.
Leave a comment