The Bayesian Conspiracy: gcsvsql

gcsvsql

Stamped: 4:49 PM |

I have encapsulated the ideas from the last post into a single script that will allow you to perform SQL queries on CSV files from the command line as though those CSV files were existing database tables in MYSQL or something.

With gcsvsql, you can do things like:


gcsvsql "select * from people.csv where age > 40"
gcsvsql "select name,score from people.csv where age >40"
gcsvsql "select name,score from people.csv where age <50 and score > 100"

Full path names should work fine:

 
gcsvsql "select * from /users/data/people.csv where age > 40"
gcsvsql "select people.name from /users/data/people.csv where age > 40"

You can even do queries with sum and average and so on like:


gcsvsql "select sum(score) from people.csv where age < 40"

If children.csv is a file with same key name as people, then you can join query like:

  
gcsvsql "select people.name,children.child from people.csv,children.csv where people.name=children.name"

You can also enter the query on multiple lines like:

 
gcsvsql "
> select people.name,children.child
> from people.csv,children.csv
> where people.name=children.name and people.age < 40"

If this sounds interesting or useful to you, get more details and download the script over at the Google code project for gcsvsql

Labels: command line, csv, database, groovy, h2, sql

The Bayesian Conspiracy

Recent

gcsvsql

Post a Comment

The Bayesian Conspiracy

Recent

gcsvsql

Post a Comment

var a = 0; if(a == 0) {document.write('no comments');} else if(a == 1) {document.write('one response');}else{document.write(a+' responses');}