One of the most joyful things about Kotlin is its support for map-reduce-filter pipelines to tranform data collections. This way of working with data is concise, elegant, and powerful.
Perhaps you have heard of MapReduce, that new paper coming out of Google? (Say word count one more time!) That’s not quite what we’re talking about here. Although it is worth noting that map-reduce(-filter) is a general enough approach to manipulating data that Google was able to build an entire distributed data processing system on top of it.
What we mean when we talk about map-reduce-filter is a style of working with data that largely leverages three core operations:
map
: map transforms each value in a list or map into some other value
reduce
: reduce combines all the values into a collection into a single value
filter
: filter creates a new collection using only items from the original
collection that match some criteria
It’s worth noting there that our map-reduce-filter pipelines will also take advantage of a lot of helper functions—even if they can themselves be implemented using map-reduce-filter.
Let’s start with a simple example.
Imagine we want to find the sum of the first 8 powers of two.
Here’s a fairly standard way to do this using a for
loop:
But there are some things about this example that we don’t like.
First, sum
has to be mutable, and we’d like to avoid mutable variables.
Second, the for
loop introduces some syntactic overhead.
In contrast, here’s how to do this using map-reduce-filter:
Note how map-reduce-filter utilizes the function programming features we have
seen earlier.
map
, reduce
, and filter
are higher-order functions, each of which takes a
single argument—an anonymous function—which we can call without
using parenthesis due to the trailing lambda syntax.
The function passed to map
and filter
takes a single argument, allowing us
to use the implicit it
to make the mapping or filtering function simpler.
The function passed to reduce
takes two arguments, meaning that we need to
name them in our anonymous function.
Kotlin’s powerful type inference also makes the example much cleaner.
Kotlin can determine that map
returns a list of Int
values, which can then
be manipulated in reduce
using the sum operator.
At this point it may be hard to reason about why the second example is cleaner,
but let’s observe a few things.
First, it more clearly breaks our algorithm down into clear steps.
We start with the list of integers from 0 to 7 (0 until 8
).
Next we compute two to the power of each of these integers using .map
.
Finally, we sum all the values together using .reduce
.
Speaking of those helper functions, we can utilize one here to make this example even cleaner:
Let’s continue by examining each of our new functional collection processing tools one by one.
map
map
is a good place to start since it is one of the easiest to understand.
A mapping function takes each member of a collection and transforms it into
something else.
It can perform any operation on each member of the collection, as long as its yields
some result for each value.
The result of a map is a new collection with the items from the original
collection replaced with the results of the mapping function.
Here’s an example where we extract the ages of a few people that we know into a list:
Note that people
above has not been modified.
Instead, mapping functions build a new list from an existing one.
So we could repeat the operation to extract the names of the people that we
know.
This is a very common use for map
when processing data:
In Kotlin the mapping function receives a single argument.
We can utilize the implicit it
, or we can also name that argument if we want:
Note that the mapping function, like another anonymous functions, does not use
an unqualified return
to return a value.
The last expression of the mapping function will be used to build a new
collection from the original data.
Using a return
statement in a mapping function will return from the enclosing
function.
This is usually not what you want:
There is a way to specify that you want to return from the mapping function
using the qualified return return@map
, but it may be cleaner to rewrite your
code to avoid this if possible:
map
on Mapsmap
is normally used to modify the values in a collection.
When applied to a list, it maintains the order of the original list:
However, we can also map
on a map either using mapValues
to transform the
values or mapKeys
to transform the keys.
Note that both mapValues
and mapKeys
receive a single argument that
corresponds to an entry in the map, not to a value or a key.
If we want to work with the value or key directly, we can also use destructing
assignment to extract these properties from it
:
reduce
reduce
is probably the hardest of the map-reduce-filter functions to understand.
Both map
and filter
examine individual values one at a time—whereas
reduce
combines values together, and therefor transmits state between each call.
Let’s try to make use of reduce
more clear using some examples.
First, let’s identify the weakness of map
.
Say we want to sum all of the values in a list.
We could do this with a map, but it requires declaring and modifying a
variable outside of the processing pipeline:
This works… but it requires a mutable variable and state outside our
processing pipeline—both of which we’re like to avoid.
Also consider what the result of the call to map
is:
While it’s nice to see the intermediate results, what we would like is a higher-order function allowing us to reduce a collection to a single value.
Unsurprisingly, that function is called reduce
.
Unlike map
, the reducing function receives two arguments.
The second is an element of the collection—but the first is the result
produced by the previous calls to reduce.
Each result of the function passed to reduce
is then passed as the first
argument to the next call to the reducing function.
Let’s look at an example to try and make this more clear:
There are a few things to note here.
First, 9 pairs of results are shown, even though the list has 10 elements.
This is because the first time that reduce is run its first two arguments are
the first two elements in the list.
This result (0) is then passed to the next call to reduce
, which displays 0
2
.
Since the result (0) never changes, we see it printed repeatedly, even as each
subsequent call to reduce
has the opportunity to incorporate a new value in
the result.
With this as our starting point, let’s look at how to implement array sum using
reduce
:
Now the output is more interesting.
We can see how the sum is being built one value at a time.
First we start with the sum of the first two values in the array: 0 and 1.
That result (1) is passed to the next call to reduce
, which adds 2, producing
3.
That result (3) is passed to the next call to reduce
, which adds 3, producing
6.
And so on.
We can use a similar approach to reducing to the product:
You might wonder what happens if you call reduce
on a collection with a single element.
Try it!
That makes a certain amount of sense.
What about on an empty list?
(Note that we need to add the Int
type parameter to our list so that Kotlin
can determine the type of the arguments to reduce
.)
fold
It’s important to note one important limitation of reduce
.
Because it is called the first time on the first pair of values in the
collection, reduce
can only reduce to a value that is the same type as the
collection.
This is because the first time that reduce
is called result
is passed the
first value in the collection, which must be the same type as the values in the
collection.
This is an important limitation. For example, imagine that we want to combine all the numbers in our list into a string.
We could work around this using casting, but that’s not the right approach.
Instead, we can use a variant of reduce
called fold
.
Unlike reduce
, fold
allows us to determine what is passed as the result to
the first call to our folding function.
Here’s how to do the sum from above using fold
instead of reduce
:
The result is the same, but observe the differences.
Our folding function was called 10 times, whereas our reducing function was
called 9.
The first call to fold
receives the value passed to fold
as its first
argument and the first item from the collection as its second argument.
After that point fold
behaves identically to reduce
.
But how the process is bootstrapped has important effects on what we can do with
fold
.
For example, here’s how to combine all of the numbers into a string:
Passing the empty string "" to fold
establishes the type of result
as
String
, and also that we must return something of type String
.
Which we do easily using string interpolation.
This extra capability of fold
turns out to be pretty useful in certain
situations.
For the sake of completeness, folding an empty list turns out better than reducing an empty list:
filter
The last commonly-used pipeline operator is filter
.
Filtering is simple.
If the filter function returns true
, the item is retained in the collection.
If it returns false
, the item is removed.
Let’s see this in practice:
Working with data using map-reduce-filter pipelines becomes second-nature quite
quickly.
It helps that Kotlin includes a bunch of useful helper functions for performing
common operations—even if they could be done using some combination of
map
, reduce
, or filter
.
We’ll show both the helpers and an idea of how to implement them—usually
using reduce
—if possible.
sum
, min
, max
sum
, min
, and max
work out of the box:
We’ve already seen how to implement sum
using reduce
.
How about min
?
distinct
distinct
eliminates duplicate items from a collection.
Again, we can reimplement this using fold
:
sorted
, reversed
sorted
sorts the items in the collection.
Without an argument the natural sort order of the elements is used:
If we want to adjust the sort order we can use sortedBy
and provide a
function that should return a sort key for each item.
Finally, reversed
will reverse the order of a collection, providing a superior
reverse sort solution to the one shown above:
groupBy
groupBy
is a fun and powerful one.
Given a list, it will group the elements by a value into a map.
Duplicate keys will be added to a list, meaning that no items are lost.
For example, let’s say that we wanted to group all people by age:
Now let’s do some processing and see what happens when we have some duplicate ages:
joinToString
Finally, it’s frequently useful to be able to reduce a collection to a String
.
Kotlin makes this easy and maybe even joyful:
The examples above just scratch the surface of what Kotlin is capable of when working with collections. Many common operations are supported out of the box, and it’s rare that you find yourself stuck and needing to actually use a loop and mutable state.
When you’re getting started with map-reduce-filter data processing, it can be fun to try and rewrite your existing data processing pipelines that probably use loops in this functional style. Give it a shot and see how you go.