Pipeable data in Ruby
This post comes from some playing around after seeing Hadley Wickham speak about pipeable data in R. In it I try to explore different ways of serially applying a set of transformations to a piece of data.
Say we want to tell a story like the following:
“the bunny Foofoo went to the forest and ate some grass”
We build up the pieces to tell the story:
def the_bunny(name)
"The bunny #{name}"
end
def went_to_the_forest(object)
"#{object} went to the forest"
end
def and_ate_some_grass(object)
"#{object} and ate some grass"
end
And then what? We have some choices.
Use nested function calls:
story = and_ate_some_grass(went_to_the_forest(the_bunny('Foofoo')))
But this is hard to read. What if we broke it out?
Use separate variables for each state:
the_named_bunny = the_bunny('Foofoo')
with_subject = went_to_the_forest(the_named_bunny)
story = and_ate_some_grass(with_subject)
Not much better. The variable names are either redundant with the method names or non descriptive.
Let’s try using one variable to hold the story as it builds:
story = the_bunny('Foofoo')
story = went_to_the_forest(story)
story = and_ate_some_grass(story)
This is better, but contrived looking with ‘story’ repeated everywhere. What if we want to tell the same story several times with a different name? We’d have to copy and paste all three lines.
So we make a method:
def tell_the_story(name)
story = the_bunny(name)
story = went_to_the_forest(story)
story = and_ate_some_grass(story)
story
end
tell_the_story('Foofoo')
tell_the_story('Booboo')
Which is great, but what if you want the option to just use a piece of your story?
def partial_story(name)
story = the_bunny(name)
story = went_to_the_forest(story)
story
end
def full_story(name)
story = partial_story(name)
story = and_ate_some_grass(story)
story
end
partial_story('Foofoo')
full_story('Booboo')
Ugh. What if there are many possible sub stories?
Maybe use lambdas with a pipeline:
storyline = [
:the_bunny,
:went_to_the_forest,
:and_ate_some_grass
].map(&method(:method))
storyline[0..1].inject('Foofoo') { |v, m| m.(v) }
storyline.inject('Booboo') { |v, m| m.(v) }
Ruby syntax starts getting in the way. We can at least hide it away:
def tell_the_story(storyline, name)
storyline.inject(name) { |v, m| m.(v) }
end
tell_the_story(storyline[0..1], 'Foofoo')
tell_the_story(storyline, 'Booboo')
But this is still sort of all over the place. We can tidy it up by wrapping it in a class:
class Bunny
def initialize(name)
@story = "The bunny #{name}"
end
def went_to_the_forest
@story += ' went to the forest'
self
end
def and_ate_some_grass
@story += ' and ate some grass'
self
end
def the_end
@story
end
end
Bunny.new('Foofoo')
.went_to_the_forest
.and_ate_some_grass
.the_end
Bunny.new('Booboo')
.went_to_the_forest
.the_end
Which is actually pretty great in terms of readability. But what if there’s another ending, which this class doesn’t know about?
def new_ending(story)
"#{story} and then gets eaten by a fox!"
end
new_ending(Bunny.new('Booboo')
.went_to_the_forest
.the_end)
The nice readability of our story in code is gone, especially if there’s more than one of these building blocks.
But suppose we skip all this superstructure and use our original methods plus a small glue method in the data class?
class String
def |(fun, *args)
method(fun).(self, *args)
end
end
Voila:
'Foofoo' |
:the_bunny |
:went_to_the_forest |
:and_ate_some_grass
This doesn’t allow extra arguments, so if a method like :the
took a parameter like :bunny
, you couldn’t do 'Foofoo' | :the, :bunny
. You could accomplish this with 'Foofoo' .| :the, :bunny
(calling the operator directly), but this doesn’t work with the multiline format above. It’s an open question for me whether Ruby could be made to support this like elixir or clojure.