DropBox as a No Paste server

In this blog post I’m going to talk about my own custom “no paste” solution that I’ve developed over the years. How I started out using a web page as a service, moved to scripting this from the command line, and how I finally ended up subverting DropBox to my ends. Skip to the end to get to the code.

So, what’s “no paste” I hear you ask? “No paste” servers allow you to go to a webpage and submit a bit of text and get a unique url back where that text can be viewed. This means that you don’t have to paste a whole bunch of text into IRC or an IM conversation, you can just upload your text and and copy and paste only the link the “no paste” server sent back into your IM. If you haven’t seen one before check out textmate’s nopaste server.

This has several advantages. Firstly and foremostly, it doesn’t “SPAM” the conversation you’re having. Pasting a whole bunch of code into a IRC channel where people are having a conversation causing that conversation to scroll off the screen before they can read it isn’t polite. Secondly it makes the text easier to read and easier to copy and paste into an editor (for example, most IRC an IM clients will prepend each line someone says with a datestamp when you copy and paste from them.)

Excellent. A useful idea. Now how can we make it better?

As a Perl programmer I tend to automate a heck of a lot of what I do with my computer. Filing in a form on a webpage is easy to do, but it’s more hassle than hitting a key combination that pastes whatever text is highlighted in your editor. If we do it a lot we should make the computer do all the work!

For a long time I used the App::Nopaste module on CPAN which installs a command line utility called nopaste which integrates with a vast range of “no paste” servers. This utility can take input on the command line and automatically fill in the forms on those websites for you. This means that it’s trivial to execute from your editor – in textmate it’s just a simple “Bundle” command.

In the end I stopped using nopaste not because I had a problem with the script, but because I had a problem with the nopaste servers, in particular the lack of privacy. Now, I’m a great believer in simply not putting anything on the internet that is truly private (face it, it’s going to get out!) but there exists a bunch of “semi-private” stuff (closed source code, contact information, private correspondence) that shouldn’t be put on a totally public paste service. Often it’s just a case of editing the URL that the “no paste” server returns by increasing or decreasing the number at the end to see the thing the next or previous person pasted!

So in the end I decided it might be a good idea to run my own custom “No Paste” solution with semi-secure (non-guessable) URLs. One problem with that: I couldn’t justify the infrastructure – I’m currently trying to reduce the amount of stuff I have to maintain, not increase it. So I started looking at what infrastructure I’m already using and seeing how I can better use that.

Enter DropBox. DropBox is a service that syncs a directory on your various computers with each other and the DropBox server. And one of the things it does is publish files in a certain directory as accessible from the web. This simplifies my problem a lot: All I need to do to have my own “No Paste” solution is simply have an easy way of getting text into a file on my hard drive and let the DropBox program automatically handle the “uploading” to a hosted service.

So, below is the script I wrote to do that. Features include:

  • Using a web-safe version of the “MD5 hex” one way hash of the file contents as the filename. This means that it’s both unguessable unless you know what the text contains and reasonably guaranteed to be unique
  • Taking input from STDIN or the system clipboard
  • Printing out the URL that the text will be available at, and/or copying it to the clipboard, and/or displaying it in a Growl message

#!/usr/bin/perl

use strict;
use warnings;

use 5.010;
use autodie;
use Path::Class qw(file dir);
use Digest::MD5 qw(md5_base64);
use Net::Growl qw(register notify);
use Getopt::Std qw(getopt);

########################################################################
# config

my $DROPBOX_ID = "301667";
my $GROWL_PASSWORD = "shoutout";

########################################################################

# get the config options
my %opt;
getopt("",\%opt);

# read the entire of STDIN / the files passed on the command line
my $data = $opt{c}
  ? read_clipboard()
  : do { local $/; scalar <> };

# work out the digest for it.  Covert the non url safe characters
# to url safe characters
my $uuid = md5_base64($data);
$uuid =~ s{/}{-}g;
$uuid =~ s{\+}{_}g;

# copy the data to the new file
open my $fh, ">:bytes",
  file($ENV{HOME},"Dropbox","Public","nopaste","$uuid.txt");
print {$fh} $data;
close $fh;

# output the url that dropbox will make that file avalible at
my $url = "http://dl.getdropbox.com/u/$DROPBOX_ID/nopaste/$uuid.txt";
say $url unless $opt{q};
write_clipboard($url) if $opt{p};
if ($opt{g}) {
  my $message = "shortly at $url";
  $message .= " (copied to clipboard)" if $opt{p};
  growl("Text Dropped", $message);
}

########################################################################

# this is mac os X depenent.  I'd use the Clipboard module from CPAN
# to make this system independent, but it fails tests.

sub read_clipboard {
  open my $pfh, "-|", "pbpaste";
  local $/;
  return scalar <$pfh>;
}

sub write_clipboard {
  my $data = shift;

  open my $pfh, "|-", "pbcopy";
  print {$pfh} $data;
  close $pfh;
}

sub growl {
  my $title = shift;
  my $description = shift;

  register(
    application => "droptxt",
    password => $GROWL_PASSWORD,
  );

  notify(
    application => "droptxt",
    password => $GROWL_PASSWORD,
    title => $title,
    description => $description,
  );

}

########################################################################

__END__

=head1 NAME

droptxt - easily write text to a file in your public dropbox

=head1 SYNOPSIS

  # read from stdin
  bash$ droptxt
  this is some text
  ^D

http://dl.getdropbox.com/u/301667/nopaste/4ZwSg8klsyBmhf9SKs-j5g.txt

  # read from a file
  bash$ droptxt some_text.txt

http://dl.getdropbox.com/u/301667/nopaste/asdSDsq_asdQsasdw12s3d.txt

  # read from the clipboard
  bash$ droptxt -c

http://dl.getdropbox.com/u/301667/nopaste/cssj12-22WWdsqQfxjpDDe.txt

  # also paste the url to the clipboard
  bash droptxt -p some_text.txt

http://dl.getdropbox.com/u/301667/nopaste/asdSDsq_asdQsasdw12s3d.txt

=head1 DESCRIPTION

This is a command line utility that is designed to be used as an
alternative to "no paste" utilities.  Instead of sending the input to a
webserver it simply writes it to a location on your hard drive where the
DropBox utility will syncronize it with the Dropox webservers.

=head2 Options

=over

=item -c

Copy the input from the system clipboard rather than from the usual
location.

=item -p

Paste the url to the system clipboard.

=item -g

Announce the url via Growl

=item -q

Do not print the url to STDOUT

=back

=head1 AUTHOR

Copyright Mark Fowler E<lt>mark@twoshortplanks.comE<gt> 2009. All rights reserved.

This program is free software; you can redistribute it
and/or modify it under the same terms as Perl itself.

=head1 BUGS

Doesn't wait for DropBox to sync the file.  The URL this creates may not be
usable straight away!

=head1 SEE ALSO

L<http://www.getdropbox.com> for details on the service.

L<App::Nopaste> for a utility that uses public "no paste" servers instead.

=cut

Let’s have our cake and eat it too

One of the most difficult tradeoffs in language design is brevity verses explicitness. Having long names for methods, functions, variables and verbose patterns makes your code much clearer and less ambiguous. To my mind there’s nothing ambiguous about:

System.out.println("Hello World");

It’s obviously printing a line to standard out and, at least as long as you associate “System” with “standard” in your head, there’s no surprises lurking here. This kind of explicitness is great. It means anyone can read the code and bit by bit pick it apart to work out exactly what’s happening (Oh…so there’s a ‘System’ class….and look an ‘out’ property….and a ‘println’ method…)

Of course, the problem with this level explicitness is that the verbosity it requires takes too long to write all the time unless you’ve got a super-charged macro based IDE. And even then it faces a worse drawback: As the patterns get larger and the verbosity multiplies it gets harder to comprehend the overall picture. Your eyes tends to glaze over after a day’s protracted coding and the details start to become obscured. For example in microcosm, the “System.out” is the least interesting bit of the above code, but it’s the part of the statement my eyes are drawn to first. Worse still, programmers tend to write the same number of lines of code no matter how verbose the programming language they tend to use is. Those languages with more brevity therefore tend on the whole to get more done in those lines!

Compare and contrast the above Java statement to the following Perl statement:

say "Hello World"

Much shorter, and much easier to read; The eyes are drawn to the “Hello World” which is indeed the interesting part of the statement. With much shorter statements and less wrapper code, the Perl users should be producing many more lines of code a day and beating the pants off the Java programmers.

Well, this is sometimes true. And, to be fair, often not.

What was the tradeoff with brevity again? Oh, yes…more ambiguity. When does this strike? In maintenance and what I like to call “pre-maintenance”, the time you’re developing the code yourself and if reaches the point it’s too big to fit in your head.

Consider the two examples above. While the Java version is clearly printing to standard out, “say” is printing to the ‘current file handle’ which almost always is standard out. Of course Perl people might consider this potential action at a distance to be a worthwhile abstraction layer. Which really emphasises an important difference between the two languages.

Perl and Java are essentially operating at about the same level of abstraction. They’re both a level above C, running on a virtual machine layer that has nice safety nets built in meaning things like memory allocation, array boundary checking, sorting algorithms, etc. are all taken care by the core language and API. There’s really little to separate the two languages, and they share more in common with each other than say, C and Prolog do, so it’s more interesting to look at the small details that make the two languages different to one another. The Perl and Java programmer have a lot to learn from one another.

This difference in syntax philosophy is really interesting to me. Perl’s basic syntax typically allows you to express more in shorter space by exploiting what is known as context. There’s a lot of implicit things. There’s list or scalar context for example, or the current file handle, or there’s the topic variables ($_, @_ et al) that are often used as default arguments for calls. This either allows you to hold more in your head (because you worry about less because you can ignore the need to restate the context all the time) or hold a lot less (because the code isn’t clear and you have to worry about what’s in the hidden context all the time.)

So in theory Perl allows you to express more in a line, but you can also get yourself in a mess a heck of a lot quicker. It can be really easy for beginners to pick up Perl compared to Java because they aren’t forced to deal with all the implicit stuff directly, but at the same time they’re not aware of all the implicit things going so it can harder to deal with too; A double edged sword.

No wonder sometimes Perl is better than Java, and sometimes Java is better than Perl.

The obvious thing that Perl and Java can do, in the grand tradition of langauge design, is learn from each other’s mistakes and steal the good bits from each other without (hopefully) picking up the bad bits. Java stealing regular expressions and Perl stealing layered IO are good examples of worthwhile theft.

So what would I change about Perl to take advantage of what Java teaches us? Probably more than some people would like, and a lot less than others.

As a way of example of what I would change I wrote is this particularly confusing chunk of code earlier in the week:

sub log {
  no warnings;
  warn @_;
}

no warnings immediately followed by warn? Gah! Of course, what I’m actually doing is suppressing all warnings that perl will generate (undefined values, printing of wide characters, etc) while it prints out my warnings message. Very brief. Different semantic domains entirely.

So what do I think we should do to fix this? Nothing. This is just the pain of having a brief ambigious language: Sometimes you’re just going to end up with what I dub “talking at semantic cross purposes” in your code. I could suggest that we force people more explicit so it’s clear what warnings I’m talking about in each case, but then I’d be changing the feel of the language. I’m in no rush to recreate Java; Java’s a fine language and I know where it is when I want to use it.

So what would I change? Ambiguity where there’s genuine confusion due to overloading meaning of things.

If you’re paying a lot of attention in my example you’ll notice that I’m not declaring something that’s going to be used as a subroutine there, despite the sub keyword (because, obviously, without syntatic gymnastics you can’t call a subroutine called log without calling the log built in function instead.) In fact, the occasions where you want a method to be also be callable as a subroutine (or vice versa) are very thin. So this is where I’d be more explicit and lose the unnecessary ambiguity be able to express the difference between a function and a method. Like so:

method log {
  no warnings;
  warn @_
}

Of course, that’s exactly what some of the more radical extensions like MooseX::Declare allow you to do, and I’ll talk more about that in a future blog entry.

CPAN Unplugged

CPAN is often described as Perl’s Killer App; Modern Perl relies on it, with the perl distribution being almost considered in parts to be nothing more than a bootstrap for the rest of the language that’s out there in the cloud. Which makes it all the more annoying when you’re stuck somewhere without an internet connection missing the vital bit of the language you need. I just had first hand experience of being offline for a two week holiday, but I didn’t have this problem when hacking on personal projects: I took CPAN with me.

So, want CPAN at your fingertips even when you’re offline? Yep, you’ve guessed it: There’s a CPAN module for that!

It’s called CPAN::Mini, and it lets you create a mini-mirror of CPAN. A mini-mirror? What’s that? It’s a mirror of just the latest non-development versions of the modules from the CPAN – or in other words, it’s a mirror of anything you can install by just typing “install” and just the module name into the cpan shell. As I type this now this mirror weighs in at about 1.1GB, which is a fair bit smaller than the full archive.

So how do we create a mini-mirror? Well, first (when you’re actually online) you need to install the module.

bash$ sudo cpan Mini::CPAN

Once you’ve done that the minicpan command will be installed on your computer.

While you can pass arguments on the command line to tell it how to run, it’s easier to create a .minicpanrc file in your home directory so you don’t have to remember what commands to type each time you want to sync your mirror. This is what mine looks like:

local: /cpan/
remote: http://www.mirrorservice.org/sites/ftp.funet.fi/pub/languages/perl/CPAN/

So I’ve got minicpan set up to download from mirrorserver.org (my nearest CPAN mirror on the internet when I’m in the UK) and create files in /cpan on my hard drive.

So all that’s left is to run the cpan mirror command and watch it download.

bash$ minicpan

This prints out each file as it downloads. The first time you run this might take a while (depending on the speed of your internet connection) so you might want to trigger it while you’re laptop is going to be in the same place for a while with a fast internet connection (i.e. just before you go to bed or just after you get into the office for the day.)

The second time you run this command it’ll update the existing mirror. This means that it won’t have to download the whole 1.1GB again, just the index files and the new modules that have been released.

bash$ minicpan
authors/01mailrc.txt.gz ... updated
authors/id/A/AD/ADAMK/Test-POE-Stopping-1.05.tar.gz ... updated
authors/id/A/AD/ADAMK/CHECKSUMS ... updated
authors/id/A/AN/ANDK/CPAN-Testers-ParseReport-0.1.4.tar.bz2 ... updated
authors/id/A/AN/ANDK/CHECKSUMS ... updated
authors/id/A/AT/ATHOMASON/Ganglia-Gmetric-PP-1.01.tar.gz ... updated
authors/id/A/AT/ATHOMASON/CHECKSUMS ... updated
authors/id/A/AT/ATHOMASON/Gearman-WorkerSpawner-1.03.tar.gz ... updated
...
cleaning /cpan/authors/id/A/AA/AAYARS/Fractal-Noisemaker-0.011.tar.gz ...done
cleaning /cpan/authors/id/A/AD/ADAMK/Test-POE-Stopping-1.04.tar.gz ...done
cleaning /cpan/authors/id/A/AL/ALEXLOMAS/CHECKSUMS ...done
cleaning /cpan/authors/id/A/AL/ALEXLOMAS/WWW-Nike-NikePlus-0.02.tar.gz ...done
...

The module will also delete any old versions of modules that are no longer in the index; In the above example you can see Adam released a new version of Test::POE::Stopping, so CPAN::Mini downloaded the new distribution and deleted the old distribution (as no modules contained in the index still relied on that distribution). This keeps the size of the local mirror as small as possible on disk.

There's several ways you can configure the CPAN module to use this new local mirror, including typing commands in the CPAN shell. However, my preferred way is to directly edit the CPAN::Config module on the system directly.

First work out where the module containing your config is installed:

bash$ perl -E 'use CPAN::Config; say $INC{"CPAN/Config.pm"}'
/System/Library/Perl/5.10.0/CPAN/Config.pm

Then edit it changing the urllist parameter to contain your CPAN mirror in addition to your normal remote mirror:

'urllist' => [
  q[file:///cpan/],
  q[http://www.mirrorservice.org/sites/ftp.funet.fi/pub/languages/perl/CPAN/]
],

This means your CPAN shell will try and install files from disk first, and if for any reason that fails (for example, you tell it to install a development release) it'll go to the second mirror.

Which way round you order the mirrors depends really on how often you update your cpan mirror and personal preference. If you, as I do, put your local mirror first this has the disadvantage that CPAN will seem "frozen" at the last time you ran minicpan, with any new changes being hidden from you until you next update. It however means that installs are very quick compared to normal internet installs (be you offline or not) and it avoids having to wait for the internet connection timeout every time CPAN tries to fetch a file and fallback to the local mirror when you're offline.

With all this done, I can now install modules in the usual way with the CPAN shell no matter if I have an internet connection or not. Of course, I haven't yet explained how I work out what modules I should be using when I'm offline and haven't got access to search.cpan.org. I'll get to that in a future blog post...

Posted in 1

Permalink 2 Comments

say What?

Now that I’ve got Snow Leopard (finally) installed on my Mac, the default perl binary is now 5.10.0. This means many things: The given keyword and smart matching, the defined-or operator, the wonderful additions to the regex engine, and other things I’m bound to blog about later when I get round to enthusing about them.

What I wanted to talk about today is the simpliest change that’ll be making the most difference to me on a day to day basis: The “say” keyword. More or less say is exactly the same as print but two characters shorter and automatically adds a newline at the end. This is most useful when you’re writing one liners. This quick calculation:

bash$ perl -e 'print 235*1.15, "\n"'

Becomes just:

bash$ perl -E 'say 235*1.15'

(Note the use of -E instead of -e to automatically turn on the 5.10 keywords like say without having to add use 5.10 or use feature 'say'.)

This saves us a grand total of nine keypresses (including having to hit shift one less time.) More importantly it saves us having to use double quotes at all. This is really useful when you’re already using the quotes for something else. For example, running a Perl one-liner remotely with ssh:

bash$ ssh me@remote "perl -MSomeModule -e 'print SomeModule->VERSION, qq{\n}"'

With 5.10 on the remote machine this becomes just:

bash$ ssh me@remote "perl -MSomeModule -E 'say SomeModule->VERSION'"

This has not only the advantage of saving me a bunch of keystrokes, but also doens’t make me think as much. And the less I have to think, the less chance I’m going to do something stupid and make a mistake.

Posted in 1

Permalink 10 Comments

Follow

Get every new post delivered to your Inbox.