Wednesday, July 11, 2012

Moosing to Norway

So it seems that the powers that be have invited me to attend the Moving to Moose hackathon this August in Norway!

Though I haven't really been coding much over the last two years in which I've been a student, I've had a lot of time to think about things that have taken up my time before my "early retirement". I think that if anything, my new found laziness and detachment has made it a bit easier to be more objectively critical of some of the ideas I've had back in the day ;-)

Fortunately both the hackathon themes are exactly what I still found myself thinking about even when not writing code daily.

After this round of final exams is over I hope to catch up with all I have missed thus far, so that I can make the most of the hackathon, and hopefully give back to the community that is so generously keeping me a part of it.

See you there!

Wednesday, October 13, 2010

Ritalin for your $PS1

In my last post I shared my colorful but otherwise inert bash prompt. Pedro Melo extended it to integrate __git_ps1 with extra coloring using git status.

Unfortunately this can take a long while on large source trees (git status needs to scan the directory structure), and while smart bash prompts are handy, it can be frustrating if every prompt incurs a delay.

Of course this is annoying for any over zealous $PROMPT_COMMAND, not just git status based ones.

My version of his version has a small trick to add simple but effective throttling:

_update_prompt () {
    if [ -z "$_dumb_prompt" ]; then

        # if $_dumb_prompt isn't set, do something potentially expensive, e.g.:
        git status --porcelain | perl -ne 'exit(1) if /^ /; exit(2) if /^[?]/'

        case "$?" in
             # handle all the normal cases
             ...
             # but also add a case for exit due to SIGINT
             "130" ) _dumb_prompt=1 ;;
        esac
    else
        # in this case the user asked the prompt to be dumbed down
       ...
    fi
}

# helper commands to explicitly change the setting:

dumb_prompt () {
    _dumb_prompt=1
}

smart_prompt () {
    unset _dumb_prompt
}

If the prompt is taking too long to show up I simply hit ^C and my $PROMPT_COMMAND becomes a quicker dumbed down version for the current session.

Tuesday, October 12, 2010

Headless VirtualBox

This being the second time I've set this stuff up, I thought it's worth documenting my VirtualBox development workflow.

A Decent Hacking Environment

The OSX side of my laptop is working pretty smoothly. I've got my stack of tools configured to my liking, from my shell, to my editor, to my documentation browser. I've spent years cargo culting all my dotfiles.

But pick my brain any day and I'll give you a mouthful about Apple and OSX. I also know that there are superior alternatives to most of my software stack.

That said, even if I'm not entirely happy with the setup, I'm definitely content with it, and I have no plans on learning anything new to gain a 3% efficiency in the way I type in text or customize the way I waste time online.

Part of the reason I use OSX is that there is no hope (and therefore no temptation) in trying to fix little annoyances, something that led me to sacrifice countless hours during the brief period of time when I had a fully open source desktop environment.

However, when it comes to installing and configuring various project dependencies (daemons, libraries, etc), OSX can be a real pain compared to a decent Linux distribution.

A Decent Runtime Environment

Disk space is cheap, and virtualization has come along way in recent years, so it really makes a lot more sense to run my code on a superior platform. One image per project also gives me brainless sandboxing, and snapshots mean I can quickly start over when I break everything.

Sweetening the deal even more, I always seem to be surrounded by people who know how to properly maintain a Debian environment much better than I could ever hope to, so I don't even have to think about how to get things right.

Bridging the Gap

In order to make it easy to use both platforms simultaneously, with cheap context switching (on my wetware, that is), I've written a script that acts as my sole entry point to the entire setup.

I don't use the VirtualBox management GUI, and I run the Linux environment completely headless (not just sans X11, without a virtual terminal either).

I hard link the following script in ~/bin, once per VM. To get a shell on a VM called blah, I just type blah into my shell prompt and hit enter:

#!/bin/bash

VM="$( basename "$0" )"

if [ -n "$1" ]; then
    # explicit control of the VM, e.g. `blah stop`
    # useful commands are 'pause', 'resume', 'stop', etc
    case "$1" in
        status) VBoxManage showvminfo "$VM" | grep -i state ;;
        *)      VBoxManage controlvm "$VM" ${1/stop/acpipowerbutton} ;; # much easier to type
    esac
else
    # otherwise just make sure it's up and provide a shell

    # boot the virtual machine in headless mode unless it's already running
    # note that there is a race condition if the machine is in the process of
    # powering down
    VBoxManage showvminfo --machinereadable "$VM" | grep -q 'VMState="running"' || \
    VBoxManage startvm "$VM" -type vrdp;

    # each VM has an SSH config like this:

    # Host $VM
    #     Hostname localhost
    #     Port 2222 # VBoxManage modifyvm "$VM" --natpf1 ...

    # changing ssh port forwarding doesn't require restarting the VM (whereas
    # fiddling with VirtualBox port forwarding does). The following section
    # should probably just be a per VM include, but for my needs it does the
    # job as is.

    # ControlMaster works nicely with a global 'ControlPath /tmp/%r@%h:%p' in
    # my ~/.ssh/config this means the port forwarding stays up no matter how
    # many shells I open and close (unlike ControlMaster auto in the config)

    # this loop quietly waits till sshd is up
    until nc -z localhost 3000 >/dev/null; do
        echo -n "."
        ssh -N -f -q \
            -L 3000:localhost:3000 \
            -o ConnectTimeout=1 \
            -o ControlMaster=yes \
            "$VM" && echo;
    done

    # finally, start a shell
    exec ssh "$VM"
fi

Once I'm in, I also have my code in a mount point under my home directory. I set up a shared folder using VirtualBox's management GUI (installing the VirtualBox guest additions like this). To mount it automatically I've got this in /etc/fstab on the guest OS:

# <file system>  <mount point>               <type>  <options>                       <dump> <pass>
some_dir         /home/nothingmuch/some_dir  vboxsf  uid=nothingmuch,gid=nothingmuch   0      0

I use the same path on the OSX side and the Linux side to minimize confusion. I decided not to mount my entire home directory because I suspect most of my dotfiles aren't that portable, and I'm not really running anything but Perl and services on the debian side.

I use all of my familiar tools on OSX, and instantly run the code on Debian without needing to synchronize anything.

When I'm done, blah stop will shut down the VM cleanly.

Finally, as a bonus, my bash prompt helps keep my confusion to a minimum when sshing all over the place.

Wednesday, October 6, 2010

Hire Me

I'm starting my B.A. at Ben Gurion University (Linguistics & Philosophy), and I'm looking for part time work (1-2 days a week) either telecommuting or in or around Beer Sheva or Tel Aviv.

If you're looking for a developer with strong and diverse technical skills, who is able to work either independently or in a team, feel free to contact me.

My CV is available on my website.

Note that I'm not really looking for contract work unless it may lead to part time employment as a salaried employee, as the overheads of being a freelancer are quite high (both financially and temporally).

Sunday, September 19, 2010

Moose has won

Stevan has always characterized Moose as a disruptive technology.

Pre-Moose metaprogramming has a long history, but you were pretty much stuck rolling your own metamodel back then.

Moose changed this by providing extensible class generation. It tries to create a metamodel in which several specialized metamodels can coexist and work together, even on the same class.

Case in point, a little over a week ago Franck Cuny announced his new SPORE project.

SPORE aims to make using REST services much easier, by generating a lot of code to deal with the transport layer, presenting the data from the REST service using simple OO methods.

In context what's interesting about SPORE is the way that it leverages Moose to do that.

SPORE extends Moose's metamodel objects, specifically the object that represents methods in a class, to create the bridge between the appealing sugar layer (a simple 1:1 mapping between HTTP requests and method calls) and the underlying HTTP client.

Take a look at the Net::HTTP::Spore::Meta::Method class. This is the essence of the sugar layer, bridging the REST client with the sleek OO interface.

Compared with SOAP::Lite (not that the comparison is very fair), SPORE is a far simpler implementation that offers more (e.g. middlewares), even if you ignore the parts of SOAP::Lite that don't apply to SPORE.

Moose made it viable to design such projects "properly", without inflating the scope of the project. In fact, using Moose like this usually reduces the amount of code dramatically.

Before Moose writing a REST toolkit with a similar metaclass based design would be overengineering a simple idea to death. The project would probably never be truly finished due to the competing areas of focus (the metamodel vs. the HTTP client vs. high level REST features).

The alternative design approach is a hand rolled stack that does the bare minimum required for each step. This might do the job, and probably gets finished on time, the code is inherently brittle. It's hard to reuse the different parts because they don't stand alone. Most pre-Moose metaprogramming on the CPAN falls into this category.

KiokuDB is another example. Without Moose it's actually quite useless, it can't deal with more than a handful of classes out of the box. Sure, you could specify the appropriate serialization for every class you want to store, but at that point the design just doesn't make sense anymore; the limitations would make it unusable in practice.

Being able to assume that Moose introspection would be available for most objects stored in the database allowed me to remove all guesswork from the serialization, while still providing an acceptable user experience (it's very rare to need a custom typemap entry in practice).

This shortcut automatically reduced the scope of the project immensely, and allowed me to focus on the internals. The only thing that really separates KiokuDB from its predecessors is that I could build on Moose.

I'm really glad to see how Moose has literally changed the way we approach this set of problems. The MIT approach is now a sensible and pragmatic choice more often than before; or in other words we get a cleaner and more reusable CPAN for the same amount of effort.

Wednesday, July 7, 2010

Are we ready to ditch string errors?

I can't really figure out why I'm not in the habit of using exception objects. I seem to only reach for them when things are getting very complicated, instead of by default.

I can rationalize that they are better, but it just doesn't feel right to do this all the time.

I've been thinking about what possible reasons (perhaps based on misconceptions) are preventing me from using them more, but I'm also curious about others' opinions.

These are the trouble areas I've managed to think of:

  • Perl's built in exceptions are strings, and everybody is already used to them. [1]
  • There is no convention for inspecting error objects. Even ->isa() is messy when the error could be a string or an object.[2]
  • Defining error classes is a significant barrier, you need to stop, create a new file, etc. Conversely, universal error objects don't provide significant advantages over strings because they can't easily capture additional data apart from the message.[3]
  • Context capture/reporting is finicky
    • There's no convention like croak for exception objects.
    • Where exception objects become useful (for discriminating between different errors), there are usually multiple contexts involved: the error construction, the initial die, and every time the error is rethrown is potentially relevant. Perl's builtin mechanism for string mangling is shitty, but at least it's well understood.
    • Exception objects sort of imply the formatting is partly the responsibility of the error catching code (i.e. full stack or not), whereas Carp and die $str leave it to the thrower to decide.
    • Using Carp::shortmess(), Devel::StrackTrace->new and other caller futzery to capture full information context is perceived as slow.[4]
  • Error instantiation is slower than string concatenation, especially if a string has to be concatenated for reporting anyway.[5]

[1] I think the real problem is that most core errors worth discriminating are usually not thrown at all, but actually written to $! which can be compared as an error code (see also %! which makes this even easier, and autodie which adds an error hierarchy).

The errors that Perl itself throws, on the other hand, are usually not worth catching (typically they are programmer errors, except for a few well known ones like Can't locate Foo.pm in @INC).

Application level errors are a whole different matter though, they might be recoverable, some might need to be silenced while others pass through, etc.

[2] Exception::Class has some precedent here, its caught method is designed to deal with unknown error values gracefully.

[3] Again, Exception::Class has an elegant solution, adhoc class declarations in the use statement go a long way.

[4] XS based stack capture could easily make this a non issue (just walk the cxstack and save pointers to the COPs of appropriate frames). Trace formatting is another matter.

[5] I wrote a small benchmark to try and put the various runtime costs in perspective.

Solutions

Here are a few ideas to address my concerns.

A die replacement

First, I see merit for an XS based error throwing module that captures a stack trace and the value of $@ using a die replacement. The error info would be recorded in SV magic and would be available via an API.

This could easily be used on any exception object (but not strings, since SV magic is not transitive), without weird globals or something like that.

It could be mixed into any exception system by exporting die, overriding a throw method or even by setting CORE::GLOBAL::die.

A simple API to get caller information from the captured COP could provide all the important information that caller would, allowing existing error formatters to be reused easily.

This would solve any performance concerns by decoupling stack trace capturing from trace formatting, which is much more complicated.

The idea is that die would not merely throw the error, but also tag it with context info, that you could then extract.

Here's a bare bones example of how this might look:

use MyAwesomeDie qw(die last_trace all_traces previous_error); # tentative
use Try::Tiny;

try {
 die [ @some_values ]; # this is not CORE::die
} catch {
 # gets data out of SV magic in $_
 my $trace = last_trace($_);

 # value of $@ just before dying
 my $prev_error = previous_error($_);

 # prints line 5 not line 15
 # $trace probably quacks like Devel::StackTrace
 die "Offending values: @$_" . $trace->as_string;
};

And of course error classes could use it on $self inside higher level methods.

Throwable::Error sugar

Exception::Class got many things right but a Moose based solution is just much more appropriate for this, since roles are very helpful for creating error taxonomies.

The only significant addition I would add make is having some sort of sugar layer to lazily build a message attribute using a simple string formatting DSL.

I previously thought MooseX::Declare would be necessary for something truly powerful, but I think that can be put on hold for a version 2.0.

A library for exception formatting

This hasn't got anything to do with the error message, that's the responsibility of each error class.

This would have to support all of the different styles of error printing we can have with error strings (i.e. die, croak with and without $Carp::Level futzing, confess...), but also allow recursively doing this for the whole error stack (previous values of $@).

Exposed as a role, the base API should complement Throwable::Error quite well.

Obviously the usefulness should extend beyond plain text, because the dealing with all that data is a task better suited for an IDE or a web app debug screen.

Therefore, things like code snippet extraction or other goodness might be nice to have in a plugin layer of some sort, but it should be easy to do this for errors of any kind, including strings (which means parsing as much info from Carp traces as possible).

Better facilities for inspecting objects

Check::ISA tried to make it easy to figure out what object you are dealing with.

The problem is that it's ugly, it exports an inv routine instead of a more intuitive isa. It's now possible to go with isa as long as namespace::clean is used to remove so it's not accidentally called as a method.

Its second problem is that it's slow, but it's very easy to make it comparable with the totally wrong UNIVERSAL::isa($obj, "foo") in performance by implementing XS acceleration.

Conclusion

It seems to me if I had those things I would have no more excuses for not using exception objects by default.

Did I miss anything?

Tuesday, July 6, 2010

KiokuDB's Leak Tracking

Perl uses reference counting to manage memory. This means that when you create circular structures this causes leaks.

Cycles are often avoidable in practice, but backreferences can be a huge simplification when modeling relationships between objects.

For this reason Scalar::Util exports the weaken function, which can demote a reference so that its referencing doesn't add to the reference count of the referent.

Since cycles are very common in persisted data (because there are many potential entry points in the data), KiokuDB works hard to support them, but it can't weaken cycles for you and prevent them from leaking.

Apart from the waste of memory, there is another major problem.

When objects are leaked, they remain tracked by KiokuDB so you might see stale data in a multi worker style environment (i.e. preforked web servers).

The new leak_tracker attribute takes a code reference which is invoked with the list of leaked objects when the last live object scope dies.

This can be used to report leaks, to break cycles, or whatever.

The other addition, the clear_leaks attribute allows you to work around the second problem by forcibly unregistering leaked objects.

This completely negates the effect of live object caching and doesn't solve the memory leak, but guarantees you'll see fresh data (without needing to call refresh).

my $dir = KiokuDB->connect(
    $dsn,

    # this coerces into a new object
    live_objects => {
        clear_leaks  => 1,
        leak_tracker => sub {
            my @leaked = @_;

            warn "leaked " . scalar(@leaked) . " objects";

            # try to mop up.
            use Data::Structure::Util qw(circular_off);
            circular_off($_) for @leaked;
        }
    }
);

These options were both refactored out of Catalyst::Model::KiokuDB.