Perl Crash Course: Subroutines

Introduction

Subroutines are user-created functions that execute a block of code at any given place in your program. It is a best practice, however, to aggregate them all either at the beginning or the end the main program.

Subroutine declarations initiate with the key word “sub” . Conventionally, subroutine names are all lowercase characters

sub NAME (PROTOTYPE) BLOCK

print_hello; # subroutine can be executed/called before the actual block is created

sub print_hello {
      print "Hello world\n";
}

When we called print_hello we told Perl that we wanted the piece of code named print_hello to be executed. The result is a “Hello World” showing up on our screen. The only benefit we have from that snippet in its current form is that we won’t have to copy/paste the print statement all over our script if we want to repeat it. All we need to do is call print_hello;

Another, older, but still common form of calling it would be

&print_hello;

Parameters

As mentioned before, our previous example doesn’t do much by itself. The good news is that subroutines can take a list of parameters and do something based on what it receives. Let’s change our example for something a little better:

$me = "Vinny";

print_hello($me);

sub print_hello {
	$name = shift;
	print "Hello $name\n";
}

# prints out "Hello Vinny"

Let’s take a closer look at that. During my subroutine call, I’m passing it a list of 1 element “Vinny”. That element is populated into the special array @_, which is then passed into the subroutine. Like $_, you don’t have to explicitly point out @_ when using certain array functions. In our example, we shifted the first element of @_ into our variable $name and printed it out.

It would have been possible to work directly against $_[0], yes, but doing so might be dangerous depending on what you’re doing. Here’s why: there are 2 ways of passing parameters into a subroutine.

The first and more common is pass-by-value. Pass-by-value is when we use the value of $_[0] into a variable of our own declared within the subroutine. Due to the lexical nature of subroutines ($name doesn’t exist outside of the subroutine in our example – we’ll see more about that later), the original variable ($me) is untouched.

The second and possibly dangerous way of doing it is by using the pass-by-reference method. In this case, we work against $_[0] directly, which is a reference to the variable passed. Any changes to $_[0] will affect the contents of $me. This is typically NOT what you want to do.

$me = "Vinny";

print_hello($me);
print "$me\n";

sub print_hello {
    print "Hello $_[0]\n";
    $_[0] .= " Alves";
}

When passing more complex structures such as hashes and arrays, bear in mind that Perl flattens them out into a single array by default.

@myArray = qw(red green blue);

%myHash = ('apple'=>'fruit',  'dog' =>'pet');

print_hello(@myArray,%myHash);

sub print_hello {
	for $i (@_) {
		print "Got $i\n";
	}
}
# Prints : # red # green # blue # apple # fruit # dog # pet

Again, that’s probably not what you wanted. To maintain the individuality and structure of your elements, you will need to pass a reference to each element as parameters, and then assign them to variables in your subroutine.

print_hello(@myArray,%myHash);

sub print_hello {

	$array_ref = shift;
	$hash_ref = shift;

	for $fruit (@$array_ref) {
		print "Got $fruit\n";
	}

	print "A dog is a " . $hash_ref->{dog} . "\n";
}

Returning Data

Subroutines are handy for returning some sort of data. By default, it returns 0 or 1 if the keyword return isn’t found – depending on the success or failure of the subroutine. Optionally, you can have it return a specific piece of data, such as a scalar, a list/array or reference to arrays, hashes, scalars, etc. The same rules we find in passing parameters apply to returning data.

#!/usr/bin/perl

$num1 = 10;
$num2 = 5;
$result = compare($num1,$num2);

print $result; # "is greater"

sub compare {
    return "is greater" if $_[0] > $_[1];
    return "is smaller" if $_[0] < $_[1];
    return "is equal"  if $_[0] == $_[1];
}

Here is another way to see the same results:

#!/usr/bin/perl

use strict;

$num1 = 10;
$num2 = 5;
$result = compare($num1,$num2);

print $result; # "is greater"

sub compare(@) {	
	if ($_[0] > $_[1]) {
		"is greater"; # "return" operator not required
	}
	elsif($_[0] == $_[1]) {
		"is equal";
	}
	else {
		"is less than";
	}
}

As you can see, we didn’t use the return operator in the example above. It is not required here because Perl will automatically return the value of the last statement evaluated. It’s not a best practice to do it this way though. We strongly recommend that you keep your returns where they’re supposed to be.

Prototypes

In order to bypass the flattening out of all the array and hash parameters into a single array, Perl allows us to specify prototypes for our subroutines. This tells our program if we’re expecting an Array, or a Hash, or to treat the elements passed as individual references instead of flattening them out. It also lets us define which parameters are mandatory and which are optional.

There are ways of overriding (turning off) prototypes. By calling the subroutine in the “old” fashion (&my_proto; or &my_proto()) for instance, the checking will not be done. Calling subroutines as methods in Object Oriented Perl also turns it off.

The catch with prototypes is that you have to declare the subroutine BEFORE actually calling it. Look at the following example:

#!/usr/bin/perl -w

@array = qw(a b c );

my_proto(@array,'test');

sub my_proto(@) {
	print "@_\n";
}
# Warns the following message: main::my_proto() called too early to check prototype at ./test.pl line 5. # and prints "a b c test"

In this example, the subroutine and its prototype are declared after the call. Perl warns that the call was too early to check prototype and treats the subroutine as if it didn’t have any. The correct way of doing it would be this:

#!/usr/bin/perl -w

sub my_proto(@); # pre-declare the sub

@array = qw(a b c );
my_proto(@array,'test');

sub my_proto(@) {
	print "@_\n";
}
# Compilation error: # Too many arguments for main::my_proto at ./test.pl line 8, near "'test')" # Execution of ./test.pl aborted due to compilation errors.

This time the script didn’t even compile (remember that Perl is an interpreted AND compiled language). Prototype checking was in place and raised the error which aborted compilation.

You probably realized that the error message we got is the same as the ones we get when we fail to provide built-in functions the right parameters. That’s exactly what prototyping does – it allows you to make your subroutines behave like built-in functions.

The perlsub section in perldoc shows the kinds of prototypes that we can use to mimick the built-in functions:

		Declared as		Called as

		sub mylink ($$)		mylink $old, $new
		sub myvec ($$$)		myvec $var, $offset, 1
		sub myindex ($$;$)	myindex &getstring, "substr"
		sub mysyswrite ($$$;$)	mysyswrite $buf, 0, length($buf) - $off, $off
		sub myreverse (@)	myreverse $a, $b, $c
		sub myjoin ([email protected])		myjoin ":", $a, $b, $c
		sub mypop (@)		mypop @array
		sub mysplice (@[email protected])	mysplice @array, @array, 0, @pushme
		sub mykeys (%)		mykeys %{$hashref}
		sub myopen (*;$)	myopen HANDLE, $name
		sub mypipe (**)		mypipe READHANDLE, WRITEHANDLE
		sub mygrep (&@)		mygrep { /foo/ } $a, $b, $c
		sub myrand (;$)		myrand 42
		sub mytime ()		mytime

Notes:

  • ‘;’ is used to separate mandatory (to the left of ; ) from optional fields (to the right).
  • @, $, %, etc. makes the sub require that exact element in the position in which it was declared.
  • sub mysub([@$%]) allows you to call mysub with a $var, or @array, or %hash, etc.
  • sub mysub($) when called with mysub(@array) will force the array to be handled in SCALAR context

Another catch of declaring subs with prototypes is that you will get a warning if you declare your sub with a given prototype first, and then again with the rest of the code and a different prototype. Let’s look at a couple of examples to get a better idea of what I mean.

Example 1

#!/usr/bin/perl -w

sub mysub(@); # defined in array context

@array = qw(a b c);

mysub(@array);

sub mysub($) {  # defined in scalar context
	print "@_\n";
}
# Prototype mismatch: sub main::mysub (@) vs ($) at ./test.pl line 15. # prints out "a b c"

Example 2

#!/usr/bin/perl -w

sub mysub($) {  # defined in scalar context
	print "@_\n";
}

@array = qw(a b c);

mysub(@array);

# works just fine # prints out "3" due to the scalar context in which the sub is defined.

Lexical Scopes

Perl by default is a very loose language. It lets you get away with things that other languages would never dream of – like using global variables indiscriminately. It’s all fine and good if you want to write a quick and dirty script that isn’t more than a few lines long, but if you’re trying to build a complex application such as a bulletin board system (Matt Wright’s wwwboard was written in (horrible) perl), you will probably find yourself missing a few toes after all the shots you foot took.

A solution for this problem is to ALWAYS use strict. Strict is a pragmatic module (or just pragma) that enforces several security measures such as having to pre-declare your variables, and another large list of constraints. It should always be the second line of your program (being the shebang the first line).

One of the things that use strict; enforces the most is to have you make good usage of lexical scopes. Lexical scopes mean that you can’t access a variable that was not pre-declared, or somehow imported into the current block of code.

#!/usr/bin/perl -w

$name = 'vinny';

myprint();

sub myprint() {
	print "$name\n";
}
# main::myprint() called too early to check prototype at ./test.pl line 7. # vinny

The example above accesses the value of the global variable $name from within the subroutine. That is not a very safe thing to do. If we turn on strict, we get an error when trying to do that. The error, however, is because we didn’t declare $name with either my, local, or our.

#!/usr/bin/perl -w

use strict;

$name = 'vinny';

myprint();

sub myprint() {
	print "$name\n";
}

# main::myprint() called too early to check prototype at ./test.pl line 7. # Global symbol "$name" requires explicit package name at ./test.pl line 5. # Global symbol "$name" requires explicit package name at ./test.pl line 11. # Execution of ./test.pl aborted due to compilation errors.

Changing the script above to comply with strict, we are once again able to access the global $name, but we still don’t want to do that, especially if we want to modify the value of $name only inside the subroutine.

#!/usr/bin/perl -w

use strict;

my $name = 'vinny'; # added my to comply with strict

myprint();

print "$name\n";

sub myprint() {
	$name .= " Alves"; # changes the original variable instead of a private one
	print "$name\n";
}

# main::myprint() called too early to check prototype at ./test.pl line 7. # vinny Alves # vinny Alves

So it’s best if we play it safe and not work on global variables at all. This is especially the case if you one day plan on writing scripts to use with Apache/mod_perl.

#!/usr/bin/perl -w

use strict;

my $name = 'vinny'; # added my to comply with strict

myprint($name); # calls the sub

print "$name\n"; # $name here still has the original value

sub myprint() {
	my $name = shift;
	$name .= " Alves";
	print "$name\n";
}
# main::myprint() called too early to check prototype at ./test.pl line 7. # vinny Alves # vinny

Context and wantarray

Something that often catches Perl beginners off guard is the concept of context. Calling an @array in scalar context will return the number of elements, and not the elements themselves. We can mimick this behavior with wantarray. This is how it works:

#!/usr/bin/perl

use strict;

my @array1 = qw(a b c d e);
my @array2 = reverse_or_count(@array1);
my $count = reverse_or_count(@array1);

print "@array2\n";
print "$count";

sub reverse_or_count(@){
	my @arr = @_;
	if (wantarray) {
		return reverse @arr;
	}
	else {
		return $#arr + 1; # last index of @arr plus 1
	}
}

Thanks to the wantarray operator returns true if the subroutine is being called in an array context, and false if not. With that kind of control, the sky is the limit ;).

Closures

Closures are usually thought of as very advanced topics, but they’re not that bad at all. They’re basically a way to work with lexical variables inside referenced subroutines. OK, I admit that that sounded terrible, but with a simple example we can clarify it.

#!/usr/bin/perl

use strict;

sub make_counter() {
	my $start = shift;
	return sub { $start++ }
}

my $from_ten = make_counter(10);
my $from_three = make_counter(3);

print $from_ten->();    # 10
print $from_ten->();    # 11
print $from_three->();  # 3
print $from_ten->();    # 12
print $from_three->();  # 4

This is how it works: Our sub make_counter takes the initial parameter into its own private $start variable and then returns an anonymous subroutine – the auto-incrementation of $start. So when we call make_counter(10) we are actually creating a reference to the auto-increment with the initial value of 10. The beauty of it is that the value isn’t lost when it comes out of scope. It’s saved in memory for the next time it’s called. That’s why calling $from_ten->() will increment on top the latest result.

What is even nicer is that we can create as many instances of that subroutine reference as we want. Calling $from_three->() will not impact the results of $from_ten->().

That is pretty much all there is to it. Use your creativity to do the rest.

« Basic I/O | TOC | File and directory tests and manipulation »

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.