<body><script type="text/javascript"> function setAttributeOnload(object, attribute, val) { if(window.addEventListener) { window.addEventListener("load", function(){ object[attribute] = val; }, false); } else { window.attachEvent('onload', function(){ object[attribute] = val; }); } } </script> <iframe src="http://www.blogger.com/navbar.g?targetBlogID=6813476980165976394&amp;blogName=The+Bayesian+Conspiracy&amp;publishMode=PUBLISH_MODE_BLOGSPOT&amp;navbarType=BLUE&amp;layoutType=CLASSIC&amp;searchRoot=http://bayesianconspiracy.blogspot.com/search&amp;blogLocale=en_US&amp;homepageUrl=http://bayesianconspiracy.blogspot.com/&amp;vt=642017532500428956" marginwidth="0" marginheight="0" scrolling="no" frameborder="0" height="30px" width="100%" id="navbar-iframe" allowtransparency="true" title="Blogger Navigation and Search"></iframe> <div></div>

The Bayesian Conspiracy

The Bayesian Conspiracy is a multinational, interdisciplinary, and shadowy group of scientists. It is rumored that at the upper levels of the Bayesian Conspiracy exist nine silent figures known only as the Bayes Council.

This blog is produced by James Durbin, a Bayesian Initiate.

A tooltip class for HTML5 canvas written with Processing.js

September 8, 2011 | comments |

The visualization code I'm writing using Processing.js needs a tooltip to display some information depending on where your mouse is in the canvas. Although there are many tooltip options for webpage elements, I didn't find any that I could easily use with Processing.js to generate tooltips dependent on the mouse position in the canvas. After a few frustrating attempts with various libraries I finally just took this rounded corners demo by F1LT3R. and hacked together a ToolTip class of my own that works with Processing.js. I think it's pretty nice for a quick hack.
The tooltip has variable transparency, detects in and out of canvas state, and adjusts itself when the tooltip approaches a canvas edge. It has a clipping mode for static canvas images and a non-clipping mode for use with canvas animations. Although I haven't tested it, this library should work just as well with the java-based Processing. A demo of the tooltip is below, along with a nice HTML5 background animation that was also written in Processing.

Mouse over the canvas to make the tooltip appear. Click on the canvas to give it focus. Press 'r' to randomize background animation. Press 'a' to cycle through alpha levels. Press 'c' to randomize tip color.


You can download ToolTip.pde and test code from github:gist. ToolTip.pde has instructions in it's header. The use of this tooltip isn't limited to Processing sketches. Since processingjs pde files are converted to pure javascript before execution, you can also use this tooltip, as well as any other Processing.js library, in javascript code that writes to the canvas. The Processing QuickStart for JavaScript Developers gives examples of how to use Processing.js libraries with pure javascript and libraries like jQuery (which, incidentally, was written by the same person who created Processing.js).

Processing.js first impression

September 2, 2011 | comments |

I've been playing around with Processing.js to produce visualizations of some genomics data. The Java based Processing language/environment was originally developed by the noted data visualization guru Ben Fry and graphic artist Casey Reas. Processing.js is a port of the Processing language and libraries to javascript and the HTML 5 canvas by none other than John Resig, the creator of the jQuery javascript library. I was skeptical at first that Processing.js would be anything but a pale shadow of it's Processing predecessor, but after using it for a bit I have to say that I am impressed.
Read more »

Labels: , , , , ,

Gibbs sampler in Groovy

July 25, 2011 | comments |

I recently read a couple of nice articles by Darren Wilkinson about implementing MCMC in various languages. The posts are here and here. Wilkinson apparently uses, or is considering using, Python for a lot of prototyping and C for a lot of his actual MCMC runs. However, since he feels that Java is in some ways nicer than C, and almost as fast, he has been using Java some also for the final MCMC runs. I thought I'd see how Groovy performed on this task.
Read more »

Labels: , ,

Using Groovlets in jQuery tutorial.

July 5, 2010 | comments |

I've been learning a little about jQuery by going through the excellent jQuery tutorial over at the jQuery Docs page. The examples use php for server side code, but since I'm more familiar with Servlets/Groovlets, I decided do the server side code as a Groovlet.
Read more »

Labels: , ,

gcsvsql

March 27, 2010 | comments |

I have encapsulated the ideas from the last post into a single script that will allow you to perform SQL queries on CSV files from the command line as though those CSV files were existing database tables in MYSQL or something.

With gcsvsql, you can do things like:


gcsvsql "select * from people.csv where age > 40"
gcsvsql "select name,score from people.csv where age >40"
gcsvsql "select name,score from people.csv where age <50 and score > 100"

Full path names should work fine:
 
gcsvsql "select * from /users/data/people.csv where age > 40"
gcsvsql "select people.name from /users/data/people.csv where age > 40"

You can even do queries with sum and average and so on like:

gcsvsql "select sum(score) from people.csv where age < 40"

If children.csv is a file with same key name as people, then you can join query like:
  
gcsvsql "select people.name,children.child from people.csv,children.csv where people.name=children.name"

You can also enter the query on multiple lines like:
 
gcsvsql "
> select people.name,children.child
> from people.csv,children.csv
> where people.name=children.name and people.age < 40"

If this sounds interesting or useful to you, get more details and download the script over at the Google code project for gcsvsql

Labels: , , , , ,

Executing arbitrary SQL on CSV files.

February 5, 2010 | comments |

My life is full of comma separated value files. A common occurrence is to be given a csv file of data and want to explore it a little bit to see what is there, or to extract several columns of data out of the csv file and create a new csv file, or grab only certain columns of data and certain rows of data and create a new csv file. Quite often I find that I even need to join one csv file with another csv file. Each time I'm faced with this task I find that what I want is to do execute sql statements on the csv file itself. After ages of writing one-off scripts to process files like these, I finally looked into actually treating csv files as databases. It turns out that the h2 database engine has good support for both in-memory databases and csv file importing. With Groovy, it's very nice. Simply download the h2 database jar and drop it in your ~/.groovy/lib/ directory. Then you can write code like this:

#!/usr/bin/env groovy

import groovy.sql.Sql
import org.h2.Driver

// Create an h2 jdbc in-memory database,
// calling it what you like, here db1
def db = Sql.newInstance("jdbc:h2:mem:db1","org.h2.Driver")

// Create a table from your csv file...
db.execute("create table people as select * from csvread('people.csv')")

// Execute a normal sql query on this table...
db.eachRow("select * from people where age < 40"){row->
println row
}

If you want to create a new csv file from your query, you will have to do a tiny bit more work. First you will need to extract the column headings of your query from the metadata, and then you will need to collect the rows into comma separated list. It's easy boiler-plate once you know how. Here is an example:

#!/usr/bin/env groovy 

import groovy.sql.Sql
import org.h2.Driver

// Create an h2 jdbc in-memory database,
// calling it what you like, here db1
def db = Sql.newInstance("jdbc:h2:mem:db1","org.h2.Driver")

// Create a table from your csv file...
db.execute("create table people as select * from csvread('people.csv')")

// Get the column headings...
db.eachRow("select * from people where age < 40 limit 1"){row->
meta = row.getMetaData()
numCols = meta.getColumnCount()
headings = (1..numCols).collect{meta.getColumnLabel(it)}
println headings.join(",")
}

// Execute a normal sql query on this table...
db.eachRow("select * from people where age < 40"){row->
meta = row.getMetaData()
numCols = meta.getColumnCount()
vals = (0..<numCols).collect{row[it]}
println vals.join(",")
}

The h2 database engine pretty quick too. I can query a 100k file (both read it in as a table and do select on it) in about 5 seconds using code like that above.

Finally, to do a join on two csv files, you only need to create the two files and execute the sql, like:

#!/usr/bin/env groovy 

import groovy.sql.Sql
import org.h2.Driver

// mem says in-memory db, and call it what you like, here db1
def sql = Sql.newInstance("jdbc:h2:mem:db1","org.h2.Driver")

// Create the first table...
sql.execute("create table people as select * from csvread('people.csv')")

// Create the second table...
sql.execute("create table children as select * from csvread('children.csv')")

sql.eachRow("""
select people.name,children.child,people.age
from people,children
where people.name=children.name and people.age > 40
"""
){result->

meta = result.getMetaData()
cols = meta.getColumnCount()
vals = (0..<cols).collect{result[it]}
println vals.join(",")
}

Labels: , , , , ,

Groovy++

January 21, 2010 | comments |

I just ran across this article on Groovy++ performance. Groovy++, it turns out, is a project to implement a static compilation of Groovy. This article reports a 6x performance boost for the low price of adding a few keywords. Very few, it turns out. We're not talking manually defining all of the static types, as in Java, but in telling the Groovy++ compiler to fix the types during compile/runtime. I'm hedging with compile/runtime since, honestly, I only discovered this project a few minutes ago and haven't had a chance to really dig into the details of how it works. The project is hosted on Google Code and, although it isn't open source yet, it's slated to become open source soon. A decent introduction to the developer and the rational can be found in this article.

I'll have more to say about this once I get a chance to really check it out, but in the mean time it's something worth watching.

Create/open OmniOutliner notebooks from command line.

October 24, 2009 | comments |

I like to use OmniOutliner as a lab notebook, and I like to be able to create/open notebooks from the command line. The OS X 'open' command will open an existing file, but I didn't see any obvious way to run OmniOutliner from the command line and tell it to create a new file. So I created an empty OmniOutliner file and dropped in ~/Documents/template, then wrote a little script to copy that template and open it if it doesn't exist, or open the pre-existing file if it does. Simple and effective.

#!/usr/bin/env groovy 
/********************************************
* A simple little script to open up a file in OmniOutliner
* creating a default file if the file does not already exist.
*
* Run it like:
*
* omni newNotebook
*/

def omniFile
if (args[0].contains(".oo3")){
omniFile = args[0]
}else{
omniFile = "${args[0]}.oo3"
}

// Get our home directory..
env = System.getenv()
home = env['HOME']

// If there is no omni outline file there, copy a template over...
if (!(new File(omniFile).exists())){
"cp -r $home/Documents/templates/OOTemplate.oo3/ ${omniFile}".execute()
}

// Open the desired file..
"open ${omniFile}".execute()

The Bayesian Conspiracy



I'm a Ph.D. student in bioinformatics, studying genomics, machine learning, and statistics. I spent the previous ten years developing genomics analysis tools for a major sequencing center.

You might see some bits here on programming, machine learning, bioinformatics,
computational biology, and maybe some other stuff too.