avatarharuki zaemon

Rails, PostgreSQL and Case-Sensitivity

By

Possibly the only thing I like about MySQL is when performing a search, the values 'SIMON' and 'sImOn' are considered equal – case-insensitive searching. PostgreSQL on the other hand considers them to be different – case-sensitive searching. Now I don’t know about you but for %99.999~ of the applications I’ve ever written, I’d rather 'Australia' and 'AuStRaLiA' weren’t considered different countries.

The “standard” approach to solving this problem is to change a query from this:

SELECT * FROM countries WHERE name = ?;

To something like this:

SELECT * FROM countries WHERE **LOWER(**name**)** = **LOWER(**?**)**;

Thereby forcing the database to perform a pseudo case-insensitive search. The only problem is that all those nice indexes you’ve created to ensure fast, efficient searching are totally ignored. (Who can spell full-table scan?) Performance issues aside (I mean after all we know that premature optimisation is the root of all evil right?) what’s just as annoying is that I can’t actually guarantee uniqueness, which is pretty much the whole point! No, even with a unique index on countries.name, the database will still quite happily allow me to:

INSERT INTO countries (name) VALUES ('Australia');INSERT INTO countries (name) VALUES ('AUSTRALIA');INSERT INTO countries (name) VALUES ('aUsTrAlIa');

So when I perform a case-insensitive search as previously discussed, I’ll end up with three (count ’em 3) records. Thankfully, there is a solution (of sorts): expression indexes.

PostgreSQL allows you to create indexes based on expressions, say for example LOWER(name), allowing us to create a unique, case-insensitive index as simply as:

CREATE UNIQUE INDEX index_countries_on_name ON countries (**LOWER(**name**)**);

Ok, so perhaps you knew this already and you’re wondering what all this has to do with Rails? Well I’m glad you asked.

Rails (as of 1.2) has a new option for validates_uniqueness_of named, oddly enough, case_sensitive. This is assumed to be true by default (meaning all searches are case-sensitive). Set it to false however and you’ll magically get validation queries that look like:

SELECT * FROM countries WHERE (**LOWER(**countries.name**)** = 'australia') LIMIT 1;

To compliment this feature, I’ve recently enhanced the RedHill on Rails Plugins in two interesting (and hopefully useful) ways.

The first is in the core and supports the creation of case-insensitive indexes during schema migration:

add_index :countries, [:name], :unique => true, **:case_sensitive => false**

The second is in schema validations and causes case_sensitive => false to be passed as an option to validates_uniqueness_of whenever a case-insensitive index is detected.

(I also looked at the possibility of automagically surrounding query parameters, etc. with LOWER() inside find methods but given the myriad forms queries can take, it seems altogether too difficult for my feeble mind at this point.)

The upshot of all this is that at the very least, it should now be possible to add case-insensitivity to your queries and be assured that (bugs not withstanding) the performance of your application won’t suddenly plummet as a consequence.

WYSIWYL

By

With the rising popularity of video podcasts as training material, I thought we needed a suitable acronym:

What You See Is What You Learn.

If only Google weren’t around to prove that I wasn’t the first to think of it. D’oh!

On the positive side, it’s apparently a method with "…guaranteed results."

Semantic APIs

By

I was chatting with my brother today (somewhat of a professional student with degrees in neuroscience, physics and maths) about software development. I was explaining how yesterday had a been a rather unpleasant day working out how to integrate with Crystal Reports. You see, what I wanted was an interface that looked something like this:

CrystalReport report = new CrystalReport("report.rpt");
report.setParameter("Posting Year", "2007");
report.setParameter("Account Number", "5678");
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
report.writeTo(outputStream);
outputStream.close();

Only instead what I got was this:

ReportClientDocument document = new ReportClientDocument();
document.open(_reportName, OpenReportOptions._openAsReadOnly);
DataDefController dataDefController = document.getDataDefController();
ParameterFieldController parameterFieldController = dataDefController.getParameterFieldController();
parameterFieldController.setCurrentValue("", "Posting Year", "2007");
parameterFieldController.setCurrentValue("", "Account Number", "5678");
InputStream inputStream = document.getPrintOutputController().export(ReportExportFormat.PDF);
int b;
while ((b = inputStream.read()) != -1) {
    outputStream.write(b);
}
inputStream.close();
document.close();

The first is nice and semantic; it’s pretty obvious what the code is doing. The second requires you to read, very carefully, each line in order to work out what is going on. Talk about leaky abstractions. Apparently my Document’s connected to my, DataDefController; my DataDefController’s connected to my, ParamaterFieldController; …

My brother drew an analogy with explaining to someone how a telephone works. In the first case, we’ve gone through a very simple explanation with just enough information to allow someone to have a go themselves; in the second example, we’re now explaining how the spin of each electron determines the probability of it going down the wire and thus contributing to the current that ultimately makes the phone call possible.

At first I was concerned: Having just recently ranted (for the umpty umpth time) that developers don’t seem to understand even the most fundamental principles of software development, here I was lamenting the fact that most APIs weren’t simple enough. My brother then asserted that perhaps the reason software is (in general) so poorly written is precisely because the APIs we are forced to use are so primitive. That because we are forced to follow so many steps in achieving something that is conceptually so simple (such as producing a report) the likely-hood of failure is much greater.

The scary thing is, this is certainly not an isolated example. Have you ever tried using the javax.mail packages? JNDI anyone?

In the end, I wrote a class named, unsurprisingly, CrystalReport with an interface exactly as in the first example and implemented almost exactly as in the second. But I seem to need to do this quite a lot when dealing in the “Enterprise” Java world.

Design Improvement Workshop

By

My mate Marty Andrews is running a Design Improvement Workshop on March 24th here in Melbourne as part of Cogent’s Easy Access Training (EAT). The content is in part, a follow on to the TDD Workshop with emphasis on identifying and removing code smells using refactoring tools and techniques.

One interesting point about these courses is that rather than being a fixed price, places are allocated by auction with the final price determined by the lowest bid! So, if you think some first-class developer training is worth $100, get in quick before you’re outbid.

Hear Me Sort

By

James has a home-grown GTD system which he’ll hopefully write about if not release to the world one day. It’s totally command-line driven using a combination of bash, perl and ruby to manipulate plain text files and, from what I can tell, it’s not only blindingly fast, it seems to work seriously well.

One of the daily tasks that James performs is to prioritise the things he wants to (attempt) to get done and what better time to do that than on the 45 minute train trip to work. As you can well imagine, prioritising 40 items takes a fair amount of time not only due to the sheer number but also, being vision-impaired, reading (and re-reading) takes a bit of concentration. So, we came up with a nifty solution in two parts.

The first part was quite simple: use binary insertion. That way, instead of O(n^2) comparisons, we’d get O(n log n). So far so good. But the that was pretty obvious. The next bit was the real doozy!

Rather than have each pair printed out so that James could read them and make a decision, we instead chose to use some Text-To-Speech (TTS) to literally speak each pair; something along the lines of:

Is calling your boss to discuss staff pay reviews more important than send wife a bunch of flowers for her birthday?

In addition, we actually run the TTS in another thread so that we can easily interrupt it, primarily because James: has already chosen a suitable answer; needs to repeat the question; or wants to cancel the whole thing.

About the only problem we encountered was that, on his Powerbook, the say command (built-in to Mac OS X) ran a little too slowly for our liking so we simply aliased it to use the much faster swift command that comes as part of the commercial Cepstral William voice he uses for all his Text-To-Speech.

Now James can listen to the minimal set of questions, making the appropriate decision for each by way of a simple keystroke, and have it all done within 15 minutes all the while free to sit and enjoy the view (such that it is on a suburban train during peak hour) rather than staring at the screen.

Yes, the rumours are true.

By

Oh, you haven’t heard? Well then, let me be the first to tell you: I’m getting married on April 14th to the loverly Jesseca ;-). We’re then heading to Japan (possibly via Borneo) in early June to spend 12 months in a little country town where we will hopefully spend our time telecommuting for work, travelling and partying. Oh and of course all I can eat Aikido training. Woohoo!

So if anyone knows of any conferences, geeky get-togethers or the like I’d love to get a little taste of the software development community in Japan.

Just Tell 'Em Joe Sent You

By

A customer walks into a bank and, after a waiting in the queue, reaches the teller:

Teller: How may I help you? Customer: I’d like to transfer a million, billion, squintillion dollars from Jo Blogg’s account into mine please. Teller: Certainly but first I’ll need some proof of identification. Customer: enters account number and PIN and clicks enter._ Teller: Ok, now I’ll need to authorise the transfer, please wait a moment. Customer: Oh, that’s ok, Joe gave me permission. Teller: I’ll get to it immediately then.

As ridiculous (and somewhat contrived) as this scenario seems, it’s a pretty apt reflection of how many large companies implement security in their web applications. Indeed many VERY large companies. Companies that, if you knew who they were, would certainly have a lot of explaining to do.

The problem as I see it is that developers in general don’t understand security. Actually, no, the problem is a little more fundamental than that. Developers in general don’t understand the technology stack with which they work day in, day out.

I can appreciate how difficult it might be to accept that you’re forking out in excess of $80k a year for a developer that doesn’t seem to understand that just because the user can’t see a hidden field on a form doesn’t mean they can’t look at the page source; that just because a user typically interacts with the server using a browser, doesn’t preclude the use of something as simple as a telnet client. In fact, I’m willing to bet that if you asked something along the lines of “have you tried seeing if you can connect to the server using telnet” you’ll get blank stares all around.

The sad truth is that using hidden form fields, magic request parameters, default credentials, and even sending permissions to the client (even if that means web server as a client of EJB) on login and then trusting the client to send those same permissions back with each request in order to, I don’t know, perhaps “save database lookups” or something, is disturbingly common in practise.

Heck, I don’t even claim to be a security expert but I’m pretty sure that not using SQL bind parameters is generally considered a bad idea. Then again, perhaps it’s not so much of a problem when all you’re doing is reading clear-text passwords out of a database?!

Integrating with MacPorts

By

When I used to run FreeBSD, I loved the ports system that comes as the standard mechanism for installing most freely available (and some commercial) software. When switched to the Mac, I immediately starting using DarwinPortsMacPorts as my main method for obtaining and installing all my “geekware”: PostgreSQL; Subversion; Ruby; and even Ant to name but a few.

Installing a new bit of software using MacPorts is usually pretty trivial. Something like:

sudo port install subversion

is all that’s needed to install Subversion and all its dependencies.

Sure, there are Mac-specific GUI-based installers for some software that I use but I generally find it easier to use MacPorts for managing dependencies, versioning, updates, etc.

There are however, other command-line based software such as Pulse that, although relatively simple to install, still take some time and thinking not only to install but to install in such a way as to not turn my Mac into a glorified Windoze machine with all those lovely security holes. If only there was a way to create a MacPort installation for these too.

Well, there is: Create one yourself. So this morning I decided to see how difficult it would be to do just that.

Reading the quick-start guide, I soon learned that writing a Portfile – Literally a TCL script named Portfile – is often all you need to do. Unfortunately, the “quick” in “quick-start” is pretty much just that, leaving quite a lot to the imagination.

So, after reading a few existing Portfiles and following a bit of experimentation, I managed to get a relatively clean (and importantly, working) Portfile for installing Pulse which I figured I’d not only share with you but I would use as a guide to creating your own.

Now, before we continue, if you haven’t already done so, I highly recommend reading the quick-start guide. It’s ok, I’ll wait…

The start of a Portfile looks something like this:

PortSystem 1.0name              pulseversion           1.2.14categories        javamaintainers       haruki_zaemon@mac.comdescription       Pulse automated build serverlong_description  Pulse is an automated build or continuous integration server. \Pulse regularly checks out your project's source code from your \SCM, builds the project and reports on the results. A project \build typically involves compiling the source code and running \tests to ensure the quality of the code. By automating this \process, pulse allows you to constantly monitor the health of \your project.homepage          http://www.zutubi.com/products/pulse/master_sites      http://www.zutubi.com/download/

All pretty self explanatory really: The name and version of the software; what exactly it is and why I’d want to use it; who to blame when something goes wrong; and where to download the installation (in this case .tar.gz) file from.

Following on, we have:

checksums         md5 dcebf03b7a7099a476371b8142ba7624depends_lib       bin:java:kaffe

These specify the MD5 checksum for the installation file to ensue we download the correct one, and any dependencies; in this case we need a Java runtime which comes pre-installed on most Macs anyway but I figured it doesn’t hurt to make sure.

Next, we set up some variables for use later on in the script, again all pretty self explanatory:

set pulseuser     pulseset pulsegroup    pulseset home          ${prefix}/share/java/${name}set bin           ${home}/binset executable    ${bin}/pulseset dbdir         ${prefix}/var/db/${name}

(${prefix} is pre-defined by the port system as the path to the base of the installation – usually /opt/local.)

Now for something kinda cool: launchctl integration. If you don’t know, launchctl is the preferred method for starting, stopping and otherwise controlling server processes under OS X. It can be a little tricky to get right with various XML configuration files, etc. however MacPorts comes to the rescue. With a few simple TCL commands, the installer will magically do all the heavy-lifting for us:

startupitem.create  yesstartupitem.init    "PULSE_HOME=${home}"startupitem.start   "su ${pulseuser} -c \"${executable} start\""startupitem.stop    "su ${pulseuser} -c \"${executable} shutdown\""

Neat huh? Now when we install the port, we’ll automagically have scripts generated in all the right places so the pulse server can be started and stopped by the system.

We’re almost up to the installation part but before we get to that, we need to stub out a few things.

You see, the whole installation process caters not only for unpacking and deploying of files but in the case of C/C++, etc. also configuring, patching and compiling, etc. In our case though, not only are these latter steps unnecessary, they’re not even possible – we’re dealing with a binary distribution – so we need to override them to do nothing:

configure {}build {}

At last we get to the actual deployment of the files, creation of users, setting of permissions, etc.:

destroot {# Create the Pulse useraddgroup ${pulsegroup}set gid [existsgroup ${pulsegroup}]adduser ${pulseuser} shell=/bin/sh gid=${gid} home=${dbdir} realname=Pulse\ Server# Ensure we have the needed directoriesxinstall -m 755 -d ${destroot}${home}# Copy the filessystem "cp -R ${worksrcpath}/ ${destroot}${home}"# Keep empty directoriesdestroot.keepdirs-append ${destroot}${home}/logs ${destroot}${home}/versions# Fix ownership of some directories pulse really needs to write tosystem "chown -R ${osuser}:${osgroup} ${destroot}${home}/logs"system "chown -R ${osuser}:${osgroup} ${destroot}${home}/versions"# Add a symlink from bin directory to the pulse scriptsystem "ln -fs ${executable} ${destroot}${prefix}/bin/pulse"}

Once again, pretty straight forward except for that odd looking ${destroot}. Why is it being used in some cases and not in others you may ask.

This actually got me for a while to until I realised that what the destroot step actually does is create an “image” of the installed software in a working directory – ${destroot} – after which the predefined install step copies the staged files into their final destination. Not a bad idea really when you think about it. Certainly saves screwing up the production environment when something goes horribly wrong – like when you’re playing around with things to see what happens ;-).

Last but not least, a little bit of post-installation goodness to instruct the user with any manual (or perhaps optional) processes, or even just a nice message to say “why thanks for using our software”:

post-install {ui_msg "#"ui_msg "# The script ${executable} has been installed to facilitate starting and"ui_msg "# stopping ${name} as a true daemon process. It must be run as ${osuser}."ui_msg "# For example:"ui_msg "#"ui_msg "#   sudo su pulse -c "${executable} start"ui_msg "#"ui_msg "# This script assumes it is run from ${home}. To run from outside this"ui_msg "# directory, you must set the value of PULSE_HOME to the absolute path"ui_msg "# of this directory. For example:"ui_msg "#"ui_msg "#   PULSE_HOME=${home} sudo su pulse -c "${executable} start"ui_msg "#"ui_msg "# You will also need To create the directory ${dbdir} if it does not"ui_msg "# already exist:"ui_msg "#"ui_msg "#   sudo mkdir -p ${dbdir}"ui_msg "#   sudo chown ${osuser}:${osgroup} ${dbdir}"ui_msg "#"}

And there you have it. Not too bad really and although we didn’t cover quite a number of the other features the port system provides, I think you will have seen enough to get you started and on your way to integrating your favourite software into the ports system.

It sure beats rememberingforgetting a lot of manual steps each time you need to install/re-install software.

Pulse on PostgreSQL

By

By default, my current toy (Pulse) stores its meta-data in an HSQLDB database. Which is fine (it’s a darn good database) but I already have plenty of infrastructure around periodically dumping all PostgreSQL databases on my system for backups. Wouldn’t it be nice if Pulse ran on PostgreSQL.

Thankfully, the fellas at Zutubi (purveyors of the afore-mentioned toy) have written up a quick HOWTO on migrating to PostgreSQL and it worked like a charm.

About the only downside is that each time I upgrade, I need to remember to add the postgres.jar to the lib directory. Hmmm. Perhaps I need to add it to the classpath before starting the server…?

Reading PostgreSQL Meta Data

By

If, like me, you need to read in various bits and pieces of meta-data in PostgreSQL, you’ve probably found it rather tiresome navigating your way around the various pg_catalog tables in order to determine where the one piece of information you need lives.

After a bit of googling, I discovered a nifty option you can set in psql: \set ECHO_HIDDEN 'yes'

Then whenever you describe a table, index, etc. using \d and friends, you’ll also get the SQL that was used!

RedHill on Rails Plugins 1.2

By

With the pending release of Rails 1.2, I’ve taken the opportunity to begin updating the plugins to take advantage of various 1.2 features such as alias_method_chain and to remove where possible, work-arounds for bugs that have been fixed. So, as of today, the trunk will only run against Rails 1.2.

For those not lucky or perhaps not foolish enough to start using the latest and greatest that Rails has to offer, you can find all the Rails 1.1.6 compatible versions of the plugins at: svn://rubyforge.org/var/svn/redhillonrails/tags/release-1.1.6/vendor/plugins

Enjoy!

Conventions Were Made to be Broken

By

The Rails convention for naming foreign key fields is to use the singular name of the table with a suffix of _id.

In the past few days a number of people using the Foreign Key Migrations plugin together with ActiveRecord session store have discovered that the exception to the rule is session_id in the sessions table. In this case, session_id is not a recursive relationship but an unfortunately named field.

The solution is to update the migration script and add :references => nil to the line that adds the session_id. The plugin will then ignore the column and not attempt to generate a database foreign key constraint.

Aikido: Saito Hitohiro Soke Seminar, Melbourne, April 2007

By

I suspect many of you aren’t interested in Aikido nor do you live in Australia. However, for those few that are and do, I will be hosting my teacher from Japan for a Seminar over the Easter Holiday weekend (6 ~ 9 April) 2007 here in Melbourne. All the details can be found in the flyer.

Feeling Poor?

By

Visit the Global Rich List; enter your annual income – I used Canadian Dollars as a close approximation for Australian Dollars; and click “show me the money!”

Changed your mind yet?

Email Signatures

By

I’ve always liked the subtitle to Charles Miller’s blog:

> tail -f /dev/mind > blog

Today, my brother was reading through some KDE newsgroups and came across this as the signature to one of the messages:

> # cd /usa/whitehouse# rm -rf *

Ahhh geek humour!

How to Entice People to Vote

By

Turn it into a lottery:

Voters weren’t keen about another, more quirky Arizona measure: They defeated a proposal that would have awarded $1 million to a randomly selected voter in each general election.

Rails Housekeeping

By

Since moving from lighttpd+FasCGI to Apache 2.2+mongrel our production rails application has been rock solid – one unexplained ruby core-dump notwithstanding.

To keep everything humming along, we run a few cron jobs which I thought I’d share.

The first is to ensure the application starts on boot. There is an rc-script to do this but I never bothered to get it running on FreeBSD. Instead, we use the @reboot keyword built into vixie-cron:

cd ~/www/production/current; mongrel_rails cluster::stop; mongrel_rails cluster::start

Next, session expiration. Even though plenty have argued against them in favour of memcached, we’ve found file-system sessions to be just fine for our, relatively low traffic, application. To keep timeout sessions after one hour – with a margin of error of an extra hour – we run an hourly cron job to delete session files that haven’t been updated since it last ran:

cd ~/www/production/current; find tmp/sessions -name 'ruby_sess.*' -amin +60 -exec rm -rf {} \;

Next, to keep log file sizes manageable, we run a cron job once a day to rotate the log files using logrotate, followed by a re-cycle of the mongrel cluster:

cd ~/www/production/current; logrotate -s log/logrotate.status config/logrotate.conf; mongrel_rails cluster::restart

And here’s config/logrotate.conf:

"log/*.log" {compressdailydelaycompressmissingoknotifemptyrotate 7}

And finally, just because, we run another daily cron job to vacuum the PostgreSQL database:

cd ~/www/production/current; psql cjp_production -c 'vacuum full'

Plugging a Team City Security Hole with a Little Obfuscation

By

If you’re not sure what I’m talking about, have a quick read of my earlier post.

The trick – as far as I can tell – to plugging the hole is to: disable guest logins to the server; and ensure each build configuration requires each agent to have a secret environment variable set to a, secret, value – something like a generated WEP key for example.

This seems to be a reasonable solution until JetBrains figures out a better mechanism. That said, in many respects, it’s not that different to the way WEP authentication works anyway.

Oh, an why the need to disable guest login? TeamCity shows all logged in users – yes, even guests – which agents aren’t compatible and, in particular, why.

Of course there may well be a way to interrogate programatically what the requirements are in which case, you’re hosed anyway :(

A City Full of Code Thieves

By

After nearly falling over at the ease with which I could use TeamCity to crush helpless machines, it suddenly occurred to me that I may have found yet another security hole.

As you probably already know, the TeamCity server doesn’t do the builds itself; rather, it farms the work off to build-agents. You can connect as many build-agents as you like to a server: simply download (or otherwise obtain) a copy of the build-agent code; configure it to point at the server; and start it up. No server-side credentials are necessary.

When a build is required, the server checks out the source code and sends a delta to the agent. This has a number of benefits, one being that you can securely configure the source-code-repository credentials in one place – the server – and the agents will be sent the source code as needed. This also poses a potential security risk.

Let’s imagine that disgruntled developer X from our previous exploit wants to obtain a copy of source code from a repository for which he has no access but for which he knows there is a TeamCity build. He simply configures his build-agent to connect to the server and waits. When the server decides it’s his machine’s turn to do a build, the server dutifully sends him a copy of the source-code…!

A City of Trojan Ants

By

As much as I’ve bitched about IntelliJ performance, and as much as I wish I didn’t have to do Java development on a regular basis, the fact of the matter is I do and IntelliJ just rocks my world in terms of features – I guess I’ll just have to wait for the new Core 2 Duo MacBook Pro to address the performance issue– so today I purchased a copy of the latest version.

IntelliJ 6.0 comes with TeamCity, a continuous-integration/build-server tool. It would appear that TeamCity isn’t just restricted to building applications, but could really do anything assuming you do in ant script.

TeamCity has the concept of build agents so all the real work is farmed off to other machines, so I setup my machine as a build-agent which required opening up a port – 9090 is the default – through the firewall on my machine. This immediately rang warning bells in my head: I had just run the agent using sudo and any application run using sudo that needs ports opened up scares me. In the end I realised I could easily run the agent as a non root user and all was happy. Excellent!

That did get me thinking however, as to what would happen on say, M$ Windoze machines where developers typically run with administrator priveleges or even poor saps like me who had stupidly run using sudo or even just as my own login.

Imagine a team of 20 developers, all of whom have obligingly run a build-agent so that their spare CPU cycles don’t go to waste. One day developer X becomes disgruntled with his employer and decides to run wild. Before he goes home at night, he checks in a change to build.xml. Not a large change, nothing special, just a one line change that recursively deletes everything starting at the root directory or even just the user’s home directory.

This change quickly makes it to the TeamCity server and then out to a build machine which dutifully executes the script, destroying everything taking down the machine with it!

Thankfully, the TeamCity server detects that the agent has gone down and, doing it’s best to utilise the resources at its disposal, re-schedules the build on the next available agent…

To be fair, TeamCity isn’t really to blame, if a developer runs build.xml without first ensuring that it does nothing macilicious, it’s really no different to running an un-verified shell script. TeamCity just makes it really easy to setup agents and ensure that the script is executed.

Me thinks it’s time make a seperate, locked-down account for running the build-agent!