The Bayesian Conspiracy

Java Statistical Libraries

March 17, 2013 | comments |

This is a simple post listing a few of the Java statistical libraries I have used at one point or another. Google often seems to fail me when searching for libraries like these so I am hoping that this will help a few people to connect up with these useful libraries. This is just a list of the libraries I have used. I am sure there are others and welcome suggestions in the comments.

jMEF Library for working with mixtures of exponential families, including estimating parameters from a sample.
JSC Java Statistical Classes. A variety of statistical functions covering combinatorics, correlation, curvefitting, descriptive statistics, distributions, regression, etc. Solid collection of basic statistical functions.
Stochastic Simulation in Java Includes support for generating variates from various distributions, computing various statistical measures and tests, and support for Monte Carlo methods.
Apache Commons Math A wide range of math and some statistical functionality.
Cern's Colt library for high performance scientific computing includes some statistical functions.

Here is another list of mathematical and some statistical libraries in Java: NIST Java Math List

Execute (real) shell commands from Groovy.

September 26, 2012 | comments |

This post is about running shell commands from within Groovy, specifically bash but it is easy to adapt to other shells. You can already run commands with syntax like:


"ls -l".execute()

That is about as simple as it gets and works great for many situations. However, execute() runs the given command passing it the list of options, the options are NOT passed through the shell (e.g. bash) for expansion and so on. As a result, you can NOT do something like:


"ls *.groovy".execute()

In this case, no shell sees the * to expand it, and so it just gets passed to ls exactly as it is. To address this, we can create a shell process with ProcessBuilder and pass the command to the shell for execution. A common use case for me is to want to just pipe the shell command's output to stdout. With some Groovy meta-object programming we can make this a method of GString and String so that you can execute any kind of string simply by calling, for example, a .bash() method on the string. Below is a class that does that. This class (including improvements) is included in durbinlib.jar. With this class, one can not only properly execute the ls *.groovy example above, but can even execute shell scripts like:


"""
 for file in \$(ls);
 do
   echo \$file
 done
""".bash()

To turn on this functionality it is necessary to call RunBash.enable() first. So a full example using the durbinlib implementation is:

#!/usr/bin/env groovy

import durbin.util.*

RunBash.enable()

"""
for file in \$(ls);
do
   echo \$file
done
""".bash()

A skeleton of the class itself follows:

import java.io.InputStream;

class RunBash{
  
  static boolean bEchoCommand = false;
  
  // Add a bash() method to GString and String 
  static def enable(){
    GString.metaClass.bash = {->
      RunBash.bash(delegate)
    }    
    String.metaClass.bash = {->
        RunBash.bash(delegate)
    }    
  }
  
  static def bash(cmd){

    cmd = cmd as String

    // create a process for the shell
    ProcessBuilder pb = new ProcessBuilder("bash", "-c", cmd);
    pb.redirectErrorStream(true); // use this to capture messages sent to stderr
    Process shell = pb.start();
    shell.getOutputStream().close();
    InputStream shellIn = shell.getInputStream(); // this captures the output from the command

    // at this point you can process the output issued by the command
    // for instance, this reads the output and writes it to System.out:
    int c;
    while ((c = shellIn.read()) != -1){
      System.out.write(c);
    }

    // wait for the shell to finish and get the return code
    int shellExitStatus = shell.waitFor(); 

    // close the stream
    try {
      shellIn.close();
      pb = null;
      shell = null;
    } catch (IOException ignoreMe) {}
  }
}

Labels: bash, groovy, ProcessBuilder

OnlineTable: Accessing csv files row at a time by column name.

August 3, 2012 | comments |

Here is a handy class I use to simplify accessing a CSV or TAB file one line at a time. I feel like I've seen this somewhere else also. I hope it's not just in GINA! Anyway, sometimes the simplest things are the most handy so it bears repeating even if it is. Suppose you have a file, beer.csv, that has a header describing the columns followed by rows of data like:

    brand,price,calories,alcohol,type,domestic
    LeinenkugelsRed,4.79,160,5.0,1,1
    SamuelAdamsBoston,5.96,160,4.9,1,1
    GeorgeKilliansIrishRed,4.70,162,4.9,1,1
    RedWolf,4.11,157,5.5,1,1
    Becks,5.83,148,4.3,3,0
    PilsnerUrquell,7.80,160,4.1,3,0

Then you can use OnlineTable to parse it like:


new OnlineTable("beer.csv").eachRow{row->
  println "brand: "+row.brand
  println "calories:"+row.calories
}

Or, if you prefer the map notation, you can use it like:


new OnlineTable("beer.csv").eachRow{row->
  println "brand: "+row['brand']
}

This version automatically inspects header to see if it's a CSV or TAB delimited file. The class is shown below but is also included as part of durbinlib.jar The latter will be updated with additional features.


/***********************************
* Support for accessing a table one row at a time.
*
*/
class OnlineTable{      
  
      String fileName
    
      def OnlineTable(String f){
        fileName = f
      }     
      
      def eachRow(Closure c){
        new File(fileName).withReader{r->
          def headingStr = r.readLine()
          def sep = determineSeparator(headingStr)
          def headings = headingStr.split(sep)
          r.eachLine{rowStr->
            def rfields = rowStr.split(sep)
            def row = [:]
            rfields.eachWithIndex{f,i->row[headings[i]]=f}
            c(row)
          }
        }
      } 
            
      def determineSeparator(line){
        def sep;
        if (line.contains(",")) sep = ","
        else if (line.contains("\t")) sep = "\t"
        else {
          System.err.println "File does not appear to be a csv or tab file.";
          throw new RuntimeException();
        }
        return(sep);
      }   
}

Labels: csv, groovy, table

A simple Cytoscape Groovy example

June 28, 2012 | comments |

Cytoscape is a network analysis package used quite often in systems biology. Cytoscape has scripting support for Python, Ruby, and Groovy. There wasn't a meaningful Groovy example so I made a couple of minor changes to the Ruby example, along with logging added for debugging:

import cytoscape.Cytoscape;
import cytoscape.layout.CyLayouts;
import cytoscape.layout.LayoutAlgorithm;
import cytoscape.CytoscapeInit

errFile = new File("test.out").withWriter{w->
  w.println "CYTOSCAPE SCRIPT LOG"

  try{
    props = CytoscapeInit.getProperties()
    props.setProperty("layout.default", "force-directed")

    new File("sampleData").eachFileMatch(~/.*\.sif/){f->
      w.println "Loading $f"
      Cytoscape.createNetworkFromFile(f.canonicalPath)
      Cytoscape.getCurrentNetworkView().redrawGraph(false,true)
    }
  }catch(Exception e){
    w.println e
  }
}

Labels: bioinformatics, Cytoscape, groovy

Quick CSV/TAB file viewer.

June 5, 2012 | comments |

My life is full of large csv and tab files. In a previous post I showed one tool I wrote to make my csv and tab file life easier, csvsql. That groovy script let you treat a csv or tab file as a database and do arbitrary queries on it. Today's post addresses another problem. I often get data files that I know nothing about from various sources and I am faced with figuring out what is in the file, or figuring out which of several files might contain information that I need. Basically, I need to look at the file as a first step to getting an idea whether it is useful to me and if so how I might go about processing it. One can "cat" the file, of course, and get some vague idea what is in the file, but if the file has many columns a simple cat can come out illegible. There are fancier things one can do with cat, such as:


cat test.csv  | column -s, -t | less -#2 -N -S

This at least gives you formatted columns that you can scroll through, but it's clunky, the columns can't be easily sorted, and so on. Importing the file into a spreadsheet is another option. For some unfathomable reason, every spreadsheet out there takes an eon to start up. Some of them won't take csv or tab files from the command line either, so you have to launch the spreadsheet then navigate to the file in question which is actually one of the biggest drawbacks of spreadsheets for my use case. Even worse, most spreadsheets simply choke on what are for me modest sized data files (say, a few tens of thousands of rows with a dozen to few hundreds of columns). After facing this irritation dozens of times, I decided to see if I could hack together a csv/tab viewer that would offer the advantages of a spreadsheet with the speed and command-line convenience of cat. The resulting script is run like this:


viewtab test.csv

It takes about 7 seconds to load a 9803 x 296 csv file and display it. Compare that with Apple's Numbers which takes 3 seconds to launch, a couple of seconds for me to find the file, tries to load it for about 7 seconds and finally tells you the file is too big and simply gives up! The results of viewtab look like this:

Once it is displayed, you can sort by column simply by selecting that column and clicking again to toggle between ascending and descending sort. viewtab relies on a Table class I wrote in Java. The easiest way to get that is just to install durbinlib.jar. Assuming you have groovy already installed, you can download and install durbinlib.jar from github like this:


git clone git://github.com/jdurbin/durbinlib.git
cd durbinlib
ant install

This will compile and install durbinlib.jar and dependencies in ~/.groovy/lib/. The actual script is below. Just copy it to a file called viewtab, make the script executable, and then put it in your path somewhere. I highly recommend that you set an environment variable to give Groovy/Java the option to use a lot of RAM:


export JAVA_OPTS='-Xmx3000m'

I'll probably expand this script in a lot of ways, but even in it's current simple form it has really taken some of the dread out of exploring csv/tab files.

Update: viewtab is now bundled with durbinlib, so after the install of durbinlib simply add durbinlib/scripts to your path and you get viewtab, csvsql, and some other goodies also.

#!/usr/bin/env groovy 

import groovy.swing.SwingBuilder
import javax.swing.JTable
import javax.swing.*
import javax.swing.table.AbstractTableModel;
import durbin.util.*

err = System.err

class TableModel extends AbstractTableModel{  
  Table dt;
  
  TableModel(Table table){dt = table;}
  
  public String getColumnName(int col) {return(dt.colNames[col]);}
  public String getRowName(int row){return(dt.rowNames[row]);}  
  public int getRowCount() { return(dt.rows())}
  public int getColumnCount() { return(dt.cols()) }
  public String getValueAt(int row, int col) {return(dt.get(row,col))}  
  public Class getColumnClass(int c) {return(dt.get(0,c).getClass())}   
  public boolean isCellEditable(int row, int col){return false;}  
  public void setValueAt(Object value, int row, int col) {}
}

fileName = args[0]

// Crudely determine if it's a tab or csv file
new File(fileName).withReader{r->
  line = r.readLine()
  if (line.contains(",")) sep = ","
  else if (line.contains("\t")) sep = "\t"
  else {err.println "File does not appear to be a csv or tab file.";System.exit(1)}
}

dt = new Table(fileName,sep,bFirstRowInTable=true)

err.print "Creating gui table..."
dtm = new TableModel(dt)

swing = new SwingBuilder()
frame = swing.frame(title:fileName,defaultCloseOperation:JFrame.EXIT_ON_CLOSE){
  scrollpane = scrollPane {           
    thetab = table(autoResizeMode:JTable.AUTO_RESIZE_OFF, autoCreateRowSorter:true){
      tableModel(dtm) 
    }             
  }
}
err.println "done."
frame.pack()
frame.show()

About the code: I had already written a Table class in Java to quickly read in and manipulate csv/tab files, so I just needed to make this the model for a JTable. I used Groovy's SwingBuilder to tie it all together. The code is short and simple and to look at it should have been a 30 minute job, but I think I spent several hours of trial and error trying to figure out exactly how to make SwingBuilder work with my custom tableModel. SwingBuilder is nice, but the documentation is totally inadequate. Hopefully someone looking for SwingBuilder documentation will stumble here and see the example that I couldn't find.

Labels: csv, durbinlib, groovy, JTable, spreadsheet, SwingBuilder

A tooltip class for HTML5 canvas written with Processing.js

September 8, 2011 | comments |

The visualization code I'm writing using Processing.js needs a tooltip to display some information depending on where your mouse is in the canvas. Although there are many tooltip options for webpage elements, I didn't find any that I could easily use with Processing.js to generate tooltips dependent on the mouse position in the canvas. After a few frustrating attempts with various libraries I finally just took this rounded corners demo by F1LT3R. and hacked together a ToolTip class of my own that works with Processing.js. I think it's pretty nice for a quick hack.
The tooltip has variable transparency, detects in and out of canvas state, and adjusts itself when the tooltip approaches a canvas edge. It has a clipping mode for static canvas images and a non-clipping mode for use with canvas animations. Although I haven't tested it, this library should work just as well with the java-based Processing. A demo of the tooltip is below, along with a nice HTML5 background animation that was also written in Processing. Note: The Chrome browser shows artifacts with this for some reason. It works fine in Safari and Firefox.

Mouse over the canvas to make the tooltip appear. Click on the canvas to give it focus. Press 'r' to randomize background animation. Press 'a' to cycle through alpha levels. Press 'c' to randomize tip color.

You can download ToolTip.pde and test code from github:gist. ToolTip.pde has instructions in it's header. The use of this tooltip isn't limited to Processing sketches. Since processingjs pde files are converted to pure javascript before execution, you can also use this tooltip, as well as any other Processing.js library, in javascript code that writes to the canvas. The Processing QuickStart for JavaScript Developers gives examples of how to use Processing.js libraries with pure javascript and libraries like jQuery (which, incidentally, was written by the same person who created Processing.js).

Processing.js first impression

September 2, 2011 | comments |

I've been playing around with Processing.js to produce visualizations of some genomics data. The Java based Processing language/environment was originally developed by the noted data visualization guru Ben Fry and graphic artist Casey Reas. Processing.js is a port of the Processing language and libraries to javascript and the HTML 5 canvas by none other than John Resig, the creator of the jQuery javascript library. I was skeptical at first that Processing.js would be anything but a pale shadow of it's Processing predecessor, but after using it for a bit I have to say that I am impressed.
Read more »

Labels: blogger, canvas, HTML5, javascript, Processing, processing.js

Gibbs sampler in Groovy

July 25, 2011 | comments |

I recently read a couple of nice articles by Darren Wilkinson about implementing MCMC in various languages. The posts are here and here. Wilkinson apparently uses, or is considering using, Python for a lot of prototyping and C for a lot of his actual MCMC runs. However, since he feels that Java is in some ways nicer than C, and almost as fast, he has been using Java some also for the final MCMC runs. I thought I'd see how Groovy performed on this task.
Read more »

Labels: Gibbs sampling, groovy, MCMC

Using Groovlets in jQuery tutorial.

July 5, 2010 | comments |

I've been learning a little about jQuery by going through the excellent jQuery tutorial over at the jQuery Docs page. The examples use php for server side code, but since I'm more familiar with Servlets/Groovlets, I decided do the server side code as a Groovlet.
Read more »

Labels: AJAX, groovlet, jQuery

gcsvsql

March 27, 2010 | comments |

I have encapsulated the ideas from the last post into a single script that will allow you to perform SQL queries on CSV files from the command line as though those CSV files were existing database tables in MYSQL or something.

With gcsvsql, you can do things like:


gcsvsql "select * from people.csv where age > 40"
gcsvsql "select name,score from people.csv where age >40"
gcsvsql "select name,score from people.csv where age <50 and score > 100"

Full path names should work fine:

 
gcsvsql "select * from /users/data/people.csv where age > 40"
gcsvsql "select people.name from /users/data/people.csv where age > 40"

You can even do queries with sum and average and so on like:


gcsvsql "select sum(score) from people.csv where age < 40"

If children.csv is a file with same key name as people, then you can join query like:

  
gcsvsql "select people.name,children.child from people.csv,children.csv where people.name=children.name"

You can also enter the query on multiple lines like:

 
gcsvsql "
> select people.name,children.child
> from people.csv,children.csv
> where people.name=children.name and people.age < 40"

If this sounds interesting or useful to you, get more details and download the script over at the Google code project for gcsvsql

Labels: command line, csv, database, groovy, h2, sql

Executing arbitrary SQL on CSV files.

February 5, 2010 | comments |

My life is full of comma separated value files. A common occurrence is to be given a csv file of data and want to explore it a little bit to see what is there, or to extract several columns of data out of the csv file and create a new csv file, or grab only certain columns of data and certain rows of data and create a new csv file. Quite often I find that I even need to join one csv file with another csv file. Each time I'm faced with this task I find that what I want is to do execute sql statements on the csv file itself. After ages of writing one-off scripts to process files like these, I finally looked into actually treating csv files as databases. It turns out that the h2 database engine has good support for both in-memory databases and csv file importing. With Groovy, it's very nice. Simply download the h2 database jar and drop it in your ~/.groovy/lib/ directory. Then you can write code like this:

#!/usr/bin/env groovy

import groovy.sql.Sql
import org.h2.Driver

// Create an h2 jdbc in-memory database,
// calling it what you like, here db1
def db = Sql.newInstance("jdbc:h2:mem:db1","org.h2.Driver")

// Create a table from your csv file... 
db.execute("create table people as select * from csvread('people.csv')")

// Execute a normal sql query on this table...
db.eachRow("select * from people where age < 40"){row->
 println row
}

If you want to create a new csv file from your query, you will have to do a tiny bit more work. First you will need to extract the column headings of your query from the metadata, and then you will need to collect the rows into comma separated list. It's easy boiler-plate once you know how. Here is an example:

#!/usr/bin/env groovy 

import groovy.sql.Sql
import org.h2.Driver

// Create an h2 jdbc in-memory database, 
// calling it what you like, here db1 
def db = Sql.newInstance("jdbc:h2:mem:db1","org.h2.Driver")

// Create a table from your csv file...  
db.execute("create table people as select * from csvread('people.csv')")

// Get the column headings...
db.eachRow("select * from people where age < 40 limit 1"){row->
  meta = row.getMetaData()
  numCols = meta.getColumnCount()
  headings = (1..numCols).collect{meta.getColumnLabel(it)}
  println headings.join(",")
}

// Execute a normal sql query on this table...
db.eachRow("select * from people where age < 40"){row->
  meta = row.getMetaData()
  numCols = meta.getColumnCount()
  vals = (0..<numCols).collect{row[it]}
  println vals.join(",")
}

The h2 database engine pretty quick too. I can query a 100k file (both read it in as a table and do select on it) in about 5 seconds using code like that above.

Finally, to do a join on two csv files, you only need to create the two files and execute the sql, like:

#!/usr/bin/env groovy 

import groovy.sql.Sql
import org.h2.Driver

// mem says in-memory db, and call it what you like, here db1 
def sql = Sql.newInstance("jdbc:h2:mem:db1","org.h2.Driver")

// Create the first table...  
sql.execute("create table people as select * from csvread('people.csv')")

// Create the second table...
sql.execute("create table children as select * from csvread('children.csv')")

sql.eachRow("""
select people.name,children.child,people.age
from people,children 
where people.name=children.name and people.age > 40
"""){result->
  
  meta = result.getMetaData()
  cols = meta.getColumnCount()
  vals = (0..<cols).collect{result[it]}
  println vals.join(",")  
}

Labels: csv, database, groovy, h2, java, sql

Groovy++

January 21, 2010 | comments |

I just ran across this article on Groovy++ performance. Groovy++, it turns out, is a project to implement a static compilation of Groovy. This article reports a 6x performance boost for the low price of adding a few keywords. Very few, it turns out. We're not talking manually defining all of the static types, as in Java, but in telling the Groovy++ compiler to fix the types during compile/runtime. I'm hedging with compile/runtime since, honestly, I only discovered this project a few minutes ago and haven't had a chance to really dig into the details of how it works. The project is hosted on Google Code and, although it isn't open source yet, it's slated to become open source soon. A decent introduction to the developer and the rational can be found in this article.

I'll have more to say about this once I get a chance to really check it out, but in the mean time it's something worth watching.

Create/open OmniOutliner notebooks from command line.

October 24, 2009 | comments |

I like to use OmniOutliner as a lab notebook, and I like to be able to create/open notebooks from the command line. The OS X 'open' command will open an existing file, but I didn't see any obvious way to run OmniOutliner from the command line and tell it to create a new file. So I created an empty OmniOutliner file and dropped in ~/Documents/template, then wrote a little script to copy that template and open it if it doesn't exist, or open the pre-existing file if it does. Simple and effective.

#!/usr/bin/env groovy 
/********************************************
*  A simple little script to open up a file in OmniOutliner
*  creating a default file if the file does not already exist. 
* 
*  Run it like:
* 
*  omni newNotebook
*/

def omniFile
if (args[0].contains(".oo3")){
  omniFile = args[0]
}else{
  omniFile = "${args[0]}.oo3"
}

// Get our home directory..
env = System.getenv()
home = env['HOME']

// If there is no omni outline file there, copy a template over...
if (!(new File(omniFile).exists())){
  "cp -r $home/Documents/templates/OOTemplate.oo3/ ${omniFile}".execute()
}

// Open the desired file..
"open ${omniFile}".execute()

Weka: getting predictions from cross validation

October 10, 2009 | comments |

A common question in weka forums is how to keep track of instances with names. Weka does not have a name field for instances, so to keep track of instances one has to create a string ID attribute that has the name of each instance. The catch, though, is that most classifiers don't work with string attributes, and you wouldn't want to classify on the ID anyway. The official solution then is to delete the ID attribute before calling the classifier. Of course, if you delete the ID, you loose the names for your instances! Oof! The solution is to use the meta.FilteredClassifier classifier with the RemoveType filter as the filter. When you hand a FilteredClassifier off to Evaluation, it will apply the filter before sending it to the classifier, but will keep track of the relationship between the source Instances (with the ID) and the filtered set sent to the classifier. Great. Now what if you want to know how instances were classified during your cross-validation? The API for extracting those classifications is not obvious, but it's easy enough once you know where to look. In Evaluate.crossValidateModel() you pass in a StringBuffer to hold the predictions. This can then be parsed to obtain the predictions and the instance names they go with. Source code to do this below:

#!/usr/bin/env groovy 

import weka.core.*
import weka.core.converters.ConverterUtils.DataSource
import weka.filters.unsupervised.attribute.RemoveType
import weka.classifiers.*
import weka.classifiers.meta.FilteredClassifier
import weka.classifiers.evaluation.*;

import java.util.Random

arffName = args[0]

// Read arff file...
data = DataSource.read(arffName)

// Pick out the class attribute..
data.setClassIndex(data.numAttributes() -1)
  
// Create a classifier from the name...
// By using filtered classifer to remove ID, the cross-validation
// wrapper will keep the original dataset and keep track of the mapping 
// between the original and the folds (minus ID). 
classifier = 
"""weka.classifiers.meta.FilteredClassifier 
      -F weka.filters.unsupervised.attribute.RemoveType 
      -W weka.classifiers.misc.HyperPipes"""

options = Utils.splitOptions(classifier)
classname = options[0]
options[0] = ""
classifier = Classifier.forName(classname,options) 

// Perform cross-validation of the model..
eval = new Evaluation(data)
predictions = new StringBuffer()
eval.crossValidateModel(classifier,data,cvFolds = 5,
  new Random(1),predictions,
  new Range("first,last"),false)

lines = predictions.toString().split("\n")
  
// Output of predictions looks like:  
// inst#     actual  predicted error prediction (ID)
//     1      1:low      1:low       1 (P1)
//     2     2:high      1:low   +   0.5 (P6)
//     3     2:high     2:high       1 (P0)
lines[1..-1].each{line->
  // Parse out fields we're interested in..      
  m = line =~ /\d:(\w+).*\d:(\w+).*\((\w+)\)/
  actual = m[0][1]
  predicted = m[0][2]
  sample = m[0][3]
  println actual+"\t"+predicted+"\t"+sample+"\t"+!line.contains("+")
}

Groovy in Google App Engine

April 8, 2009 | comments |

I'm quite excited that Google has added Java as a supported language in Google App Engine. Google App Engine is a free (up to a certain quota) service from Google where Google hosts your web app and handles all the issues involved in making it scalable and so on. Check out the Google App Engine site for more info. Google App Engine has been out for awhile but until a couple of days ago it only supported Python as a development language. Python is fine, so far as it goes, but outside of Google itself Python has nowhere near the base of web application support that Java does. I expect Java support to really blow the lid off of Google App Engine. In supporting Java, Google also supports a number of java based frameworks and JVM based languages. Most notable for me is support for writing a Google app engine app in Groovy.

Labels: google app engine, groovy, java

Invalid duplicate class definition

March 14, 2009 | comments |

"Invalid duplicate class definition...One of the classes is a explicit generated class using the class statement, the other is a class generated from the script body based on the file name. Solutions are to change the file name or to change the class name. "

When I was first learning Groovy I would get this error from time to time. It was puzzling to me because sometimes I'd get this error, and sometimes it would seem like the same situation and I wouldn't. It seemed quite random to me when the error would crop up. When it did, I'd usually just rename the class and go on, resolving to figure it out later. To save you the trouble, here's what is happening.

Groovy has two ways to treat a .groovy file: either as a script, or as a class definition file. If it is a script you can not have a class by the same name as the file. If it is a class definition file you can. It is very easy to tell whether a .groovy file is going to be treated as a script or as a class definition file. If there is any code outside a class statement in the file (other than imports), it is a script. What is happening is that if there is any code to be executed in the file then Groovy needs a containing class for that code. Groovy will implicitly create a containing class with the name of the file. So if you have a file called Grapher.groovy that has some code in it that isn't inside a class definition, Groovy will create an implicit containing class called Grapher. This means that the script file Grapher.groovy can not itself contain a class called Grapher because that would be a duplicate class definition, thus the error. If, on the other hand, all you do in the file Grapher.groovy is define the class Grapher (and any number of other classes), then Groovy will treat that file as simply a collection of class definitions, there will be no implicit containing class, and there is no problem having a class called Grapher inside the class definition file Grapher.groovy.

It's worth mentioning that the script version of Grapher.groovy will be compiled into a class called Grapher that extends groovy.lang.Script. In the other case, when Grapher.groovy merely defines classes, one of which is Grapher, that Grapher class will be compiled into a class that implements groovy.lang.GroovyObject.

I'm sure this is all explained somewhere in the Groovy documentation, but it didn't soak in to me until I read this Nabble post from which I extracted this explanation.

UPDATE: The text of this error message has changed (at least in some cases) to be a bit more informative. Now it reads:

One of the classes is an explicit generated class using the class statement, the other is a class generated from the script body based on the file name. Solutions are to change the file name or to change the class name.

The underlying mechanics behind this error are the same.

Labels: groovy

Groovy compared to Perl

March 9, 2009 | comments |

If you are a bioinformatics person considering taking up Groovy as an alternative to Perl, you will naturally wonder how the two compare on a range of simple tasks. Luckily, there is a great set of examples over at PLEAC (Programming Language Examples Alike Cookbook). Most bioinformatics Perl programmers have probably seen the Perl Cookbook by Tom Christiansen & Nathan Torkington. The aim of PLEAC is to implement all of the solutions found in the Perl Cookbook in other languages. The Groovy examples are 100% complete, so you can see how every Perl cookbook solution could be performed in Groovy. Go have a look!

Labels: bioinformatics, groovy, perl

Groovy for bioinformatics.

| comments |

I'll have a lot to say about Groovy for bioinformatics in this blog. Until I have time to write some posts on the topic here is Mark Fortner's brief comparison of how Groovy stacks up to Perl for bioinformatics. Mark has also set up a BioGroovy project wiki.

Labels: biogroovy, bioinformatics, groovy

Groovy is groovy. Groovy is Java.

June 5, 2007 | comments |

Groovy is groovy. At least for me. Groovy is also Java, like quick and dirty Java all dripping with syntatic sugar. If you know Java you essentially know 80% of what there is to know about Groovy because underneath it's all Java objects and the syntax is roughly a superset of Java. Most Java programs will run, unaltered, as Groovy programs. Although Groovy's dynamic features, powerful language constructs, and other goodness will grow on you, a Java programmer can approach Groovy initially as if it were simply a kind of Java that can be executed without compiling and with the option to soft focus some of the details.

For example, here is a dumb little Groovy script that I wrote to convert a comma separated file into a tab separated file.

#!/usr/bin/env groovy

for(line in System.in.readLines()){
  bits = line.split(",");
  for(bit in bits){
    print bit+"\t";
  }
  println "";
}

This is something I wrote on the first day I was learning Groovy so it is not idiomatic Groovy. Groovy can be much more succinct than this. A Perl person might prefer a command line one-liner like:

cat test.tab | groovy -pe '(line =~/\t/).replaceAll(",")'

Nonetheless, this little script shows how one can come to Groovy gently from Java. First, that's a complete Groovy program. If it were Java, there'd have to be more, the declaration of a containing class, for example. Groovy provides an implicit containing class with the name of the file. You'd also have to be more fastidious about declaring the types of variables. Groovy works out types during runtime. The types Groovy eventually assigns to variables will be Java types. line and bits, for example, will resolve to java.lang.String, and all the functionality of java.lang.String is there for you just like you remember it. Also, unlike a Java program, Groovy scripts don't need compiling (if you need to, though, you can compile Groovy to a .class file to use in a Java program). You can just run the file like a Perl script, for example:

cat data.txt | comma2tab  > data.tab

So you can see that you get that sort of Perl-like hack-together-something-quick quality: no compiling, direct execution of the source file (via shebang or the command line interface), rapid edit/test cycle, and stripped down verbosity. Suppose I also wanted to add a date tag to each line. When I was doing my scripting with Perl I'd have to grind to a halt while I looked up various Perl date APIs. Obviously, if 99% of my code were scripting Perl I'd probably already know that, but my code is more like 10% C++, 65% Java, and 25% scripting. As a result, I'm much more likely to already know the Java API for something than the Perl API. With Groovy I can add date functionality without having to learn any new API's, I can just use the Java one I already know:

#!/usr/bin/env groovy  
  
date = new Date();  
for(line in System.in.readLines()){  
  println date.toString()+"\t";  
  bits = line.split(",");  
  for(bit in bits){  
    print bit+"\t";  
  }  
  println "";  
}

That print is, by the way, System.out.println, it' just lets you use the shorthand if you like. You don't have to though. With a simple import, you can use any of your own Java classes just as easily. Drop any external library into your CLASSPATH, or a jar into your ~/groovy/lib directory, and you have ready access to that functionality too. You can even hack together simple GUI's in just a few lines of code.

Given that I have worked on several pure-Java projects in the past few years, Groovy really fills a niche for me. Working in bioinformatics I have a lot of need for scripting, to create quick and dirty one-off programs to crunch some data or do some bulk operation, something someone wants today and will never use again. I also have a fair bit of need for production quality webpages, for computation intensive algorithms, and for GUI visualization tools. The combination of Groovy+Java nicely fills all of these needs for me. I can write my high quality webpages in Java (or Grails), my computation intensive algorithms in Java (which has become performance competitive in the past few years, within 2x of C [3]), my GUI visualization tools in Groovy/Java/Swing, and my one-off scripts and pipeline glue in Groovy. One set of libraries and API's to master to do all my work. The ability to share code across all the kinds of work I do. All my code is cross platform. It's like having my cake and eating it too!

If you know Java, you owe it to yourself to try Groovy. If you find yourself writing one kind of software in one language, another in a different language, you owe it to yourself to try Groovy. The rest of you, you should probably try it too.

[1] Six Things Groovy Can Do For You
[2] IBM Groovy articles
[3] 1.8x median, behind only C++, C, and ATS

Labels: bioinformatics, groovy, java

The Bayesian Conspiracy

Recent

The Bayesian Conspiracy