Update: Changed subroutine to comply with Perl Best Practices
Update2: Removed the prototype from the subroutine.
I’ve always had a problem with recursion. Not with the general theory that a function will call itself, etc – no, that’s easy. The hard part was when I had to deal with complex data structures in Perl (an array- or hashref containing a hash of arrays of hashes, a gazillion levels deep). Well, I guess anyone would have a hard time with that kind of data.
Anyway, in this post I don’t intend to get all complicated explaining all the kinds of recursions out there. If you want that, check this article at wikipedia. What I do want to do is help all of those who are in the situation I was in, by explaining in the simplest way possible how to deal with this scenario.
Let’s start with a need. I have a complex data structure that needs its spaces trimmed on both sides. But since I’m lazy, I’d like my subroutine to modify the data directly, and not return the modified value (pass by reference, not pass by value).
Here’s our data structure:
my $data = [ { key1 => ' trim me! ', key2 => ' trim me too! ', }, [ 'some element to trim ', ' another one ', ], ' a simple string needing trimming ', ];
$data explained
: an array containing 3 elements: element 0 is a hashref of keys "key1"
and "key2"
, element 1 is an arrayref of 2 elements. Element 3 is a simple string. All values have some extra spaces that need trimming (or so they say). We could use whatever number of levels and data types we want (except for anonymous subroutines, I guess – let’s not get too complicated).
Now, to trim all that, I want to be able to simply call trim()
à la PHP.
trim($data); # note the lack of the lvalue (lvalue = rvalue)
I also want it to accept simple arrays and hashes, and the references thereof: trim(@array); trim(@array); trim(%hash); trim(%hash); trim($string)
. After all, I never know what kind of data my colleagues will be working with. Better have it deal with everything.
The logic to do that is this: our subroutine will have to do the trimming (s///g) on scalars only. For that, it has to check if the data it received is a hash, array, etc, and if it is, iterate through each element and trim the value… but only if the element is not itself a hash, array, etc. Found it confusing? No problem, it really is.
In Perl, if I tried to remove the white space from element 0 of my $data
variable, it wouldn’t work. The reason being is that if I printed $data->[0]
onto the screen, I’d get a funny looking output, something like HASH(0x1004f5f0)
. That’s Perl’s way of saying that you have a HASH structure stored in memory position 0x1004f5f0. You can try to trim the spaces off of that string, but it won’t do you any good. The elements of your hash will still be untouched. That’s why you need to de-reference your data structures and dive into them.
To de-reference a structure is simple, just add a % in front of the variable if it’s a hashref, or an @ if it’s an array. But how do you know which is which? Use ref()
.
print ref($data->[0]) . "n"; # HASH print ref($data->[1]) . "n"; # ARRAY print ref($data->[3]) . "n"; # empty string, which is false
ref()
tells you what kind of data you are dealing with. It returns CODE
if you have a closure or anonymous subroutine, but we’re not going there today.
So, now that we know how to identify the type of element we’re going to be working with, we can build our subroutine…
sub trim() { for my $param (@_) { if (ref($param) eq 'ARRAY') { for my $element (@{$param}) { trim($element); } } elsif (ref($param) eq 'HASH') { for my $val (values %{$param}) { trim($val); } } elsif (ref($param) eq 'CODE') { return; } else { $param =~ s/(^s+|s+$)//g; } } }
trim()
explained:
We’re working with passing elements by reference instead of by value. This means that the elements themselves will be modified – no need to return any data. The first thing we do is to iterate through all parameters passed to trim()
. In a subroutine, parameters (in our case, variables) are populated into the special @_
array, allowing us to call trim($var1, $var2, $var3)
if we want.
We iterate through all elements of @_
and verify if they are an Array. If they are, we iterate through each of their elements once, and call trim()
again against them. That will handle as many nested arrays we want (or that your computer can handle). Now we have to make it deal with hashes. Same technique – use ref()
to see if it’s a hash. If it is, then iterate through each of its key/pair elements. There are several ways to do that. I personally prefer calling keys
to get the keys and use them to fetch the values of the hash. The value of the hash is passed to trim()
for more validation. We also check to see if we received a sub { }
(anonymous subroutine). In that case, we do nothing, just return.
Finally, after handling Arrays, Hashes and Anonymous subroutines, we can set up the actual trimming of the strings. We take the $_[$i]
which is the parameter passed and remove the leading and trailing spaces with one neat substitution: ^s+
stands for leading spaces, s+$
stands for trailing spaces, and it’s all joined by the ( | )
(this or that). We only call it once because we’re using the global (g) modifier of the substitution s///g
.
And that’s all there is to it!
A note on prototypes: This post generated a healthy discussion on prototypes. I had originally added the dollar prototype to the trim()
subroutine, but that was forcing it to accept only scalars (strings and references), and not working with normal hashes and arrays. Thanks to everyone who participated in the discussion.
14 Responses
sub trim would not work with a prototype. You should remove it.
It depends on your prototype. If you set prototype to trim($), you’ll get away with any data passed (I tested). But if you set it to trim($), then you get an error trying to do trim(%hash) or trim(@array).
Alexandr is correct. Given your trim (with prototype):
#! /usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my @test = (' first string ');
trim(@test);
print Dumper(@test);
prints:
main::trim() called too early to check prototype at trimtest.pl line 13.
main::trim() called too early to check prototype at trimtest.pl line 18.
$VAR1 = ' first string ';
But remove the prototype and the same code gives you:
$VAR1 = 'first string';
See also Prototypes Considered Harmful by Tom Christiansen.
As usual, CPAN reveals that someone has already done it for us:
use Data::Transformer;
Data::Transformer->new(normal => sub { $$_[0] =~ s/((^s+)|(s+$))//g })->traverse($data);
Still, a good example to explain recursion with 🙂
Vinny: Try this test with prototype:
#!/usr/bin/perl
use 5.008;
use strict;
use warnings;
use Test::More qw(no_plan);
sub trim {
…
}
my $c=’ test ‘;
trim($c);
is($c,’test’);
my @d=(‘ a ‘,’b’);
trim(@d);
is($d[0],’a’);
is($d[1],’b’);
I think when you’re using the array or hash, after you’ve trimmed it you have to put it back in the array or hash, otherwise you’re just trimming your own copy. Have I missed something?
Hi Christine. There are 2 ways of doing it: one is to copy the values passed by @_ into lexical variables and then return the trimmed copy (pass by value) and the other is to work directly on the variables passed (pass by reference). When you work on a reference (like I did with trim()), there’s no need to return the data since the original is modified automagically.
Here’s another example of passing by reference:
my @array = (1,2,3,4,5);
for my $i (@array) { # each element is referenced into $i
$i++;
}
print "@array"; # 2 3 4 5 6
Now, if I had specified a lexical for each element…
my @array = (1,2,3,4,5);
for my $i (@array) { # each element is referenced into $i and then copied into $val
my $val = $i;
$val++;
}
print "@array"; # 1 2 3 4 5
The auto-incremented result stays in
$val
which is lost with each iteration of the loop. I would have had to assign it back into$i
in order to change@array
Cheers!
Well, I did small modification to make it works. Anyway, you are talking about references at
trim
sub. When sub will get reference it is just one or could be list as you want. I did test just in one reference in following codes.First test code:
#!/usr/bin/perl -w
use strict;
use Data::Dumper;
my $data = [
{
key1 => ' trim me! ',
key2 => ' trim me too! ',
},
[
'some element to trim ',
' another one ',
],
' a simple string needing trimming ',
];
&trim($data);
print "result : ".Dumper($data);
sub trim {
my $input = shift;
if (ref($input) eq "ARRAY") {
print "ARRAY:".Dumper($input);
foreach my $element (@{$input}) {
&trim($element);
}
} elsif (ref($input) eq "HASH") {
print "HASH:".Dumper($input);
foreach my $element (values %{$input}) {
&trim($element);
}
} else {
$input =~ s%(^s+|s+$)%%g;
print "STRING:'".$input."'n";
}
}
__END__
As we see, each value for each nested element have been processed properly, but not modified at original source.
ARRAY:$VAR1 = [
{
‘key2’ => ‘ trim me too! ‘,
‘key1’ => ‘ trim me! ‘
},
[
‘some element to trim ‘,
‘ another one ‘
],
‘ a simple string needing trimming ‘
];
HASH:$VAR1 = {
‘key2’ => ‘ trim me too! ‘,
‘key1’ => ‘ trim me! ‘
};
STRING:’trim me too!’
STRING:’trim me!’
ARRAY:$VAR1 = [
‘some element to trim ‘,
‘ another one ‘
];
STRING:’some element to trim’
STRING:’another one’
STRING:’a simple string needing trimming’
result : $VAR1 = [
{
‘key2’ => ‘ trim me too! ‘,
‘key1’ => ‘ trim me! ‘
},
[
‘some element to trim ‘,
‘ another one ‘
],
‘ a simple string needing trimming ‘
];
It is expected result in this case. When we will do other small code modification to set processed data it is returns desirable result.
Second test code:
#!/usr/bin/perl -w
use strict;
use Data::Dumper;
my $data = [
{
key1 => ' trim me! ',
key2 => ' trim me too! ',
},
[
'some element to trim ',
' another one ',
],
' a simple string needing trimming ',
];
&trim($data);
print "result : ".Dumper($data);
sub trim {
my $input = shift;
if (ref($input) eq "ARRAY") {
for (my $key=0; $key<scalar(@{$input}); $key++) {
${$input}[$key] = &trim(${$input}[$key]);
}
} elsif (ref($input) eq "HASH") {
foreach my $key (keys %{$input}) {
${$input}{$key} = &trim(${$input}{$key});
}
} else {
$input =~ s%(^s+|s+$)%%g;
}
return $input;
}
__END__
This is output:
result : $VAR1 = [
{
'key2' => 'trim me too!',
'key1' => 'trim me!'
},
[
'some element to trim',
'another one'
],
'a simple string needing trimming'
];
Hi AlexB,
Let me comment on your code. It has to do with what Christine Morton commented on and I explained on that reply about passing variables by value and by reference.
My original subroutine works. It runs a loop on each element of
@_
handling them as references to the original input parameter. That’s why you don’t seeshift()
in my subroutine. When you callshift()
like you did on yours, you’re no longer working on references but on copies of the values. Yes, when you’re working on copies the originals don’t get changed, so you have to return the modified data and assign it to the original like you did for it to work, but I didn’t really want that – I wanted it to work likepush()
, auto-increment/decrement and many other Perl unary functions (no lvalue!).Your code works – but it’s a different approach (the one Christine Morton was asking about).
Cheers,
Vinny
Vinny, I understand your code is working and working directly at reference.
When I did modification I mean it is also working as prototype, no errors accrues. It is working with hashes and arrays without problem.
sub trim($);
&trim($data);
sub trim($) {
my $input = shift;
if (ref($input) eq "ARRAY") {
for my $key (0..$#{$input}) {
${$input}[$key] = &trim(${$input}[$key]);
}
} elsif (ref($input) eq "HASH") {
foreach my $key (keys %{$input}) {
${$input}{$key} = &trim(${$input}{$key});
}
} else {
$input =~ s%(^s+|s+$)%%g;
}
return $input;
}
Alex,
By calling
trim()
as&trim()
, you’re bypassing prototypes altogether. Try removing all the & from your calls and you’ll see that nothing gets trimmed. Also, supposing it did work, you would be limiting the use of any data structures other than scalars.Just try as is. You will see result.
As I see from your requirement you are pushing basically scalars to the function.
Not really – I want to be able to use it with scalars, hashes, and arrays.