One of the most joyful things about Kotlin is its support for map-reduce-filter pipelines to tranform data collections. This way of working with data is concise, elegant, and powerful.
Perhaps you have heard of MapReduce, that new paper coming out of Google? (Say word count one more time!) That’s not quite what we’re talking about here. Although it is worth noting that map-reduce(-filter) is a general enough approach to manipulating data that Google was able to build an entire distributed data processing system on top of it.
What we mean when we talk about map-reduce-filter is a style of working with data that largely leverages three core operations:
map: map transforms each value in a list or map into some other value
reduce: reduce combines all the values into a collection into a single value
filter: filter creates a new collection using only items from the original
collection that match some criteria
It’s worth noting there that our map-reduce-filter pipelines will also take advantage of a lot of helper functions—even if they can themselves be implemented using map-reduce-filter.
Let’s start with a simple example.
Imagine we want to find the sum of the first 8 powers of two.
Here’s a fairly standard way to do this using a
But there are some things about this example that we don’t like.
sum has to be mutable, and we’d like to avoid mutable variables.
for loop introduces some syntactic overhead.
In contrast, here’s how to do this using map-reduce-filter:
Note how map-reduce-filter utilizes the function programming features we have
filter are higher-order functions, each of which takes a
single argument—an anonymous function—which we can call without
using parenthesis due to the trailing lambda syntax.
The function passed to
filter takes a single argument, allowing us
to use the implicit
it to make the mapping or filtering function simpler.
The function passed to
reduce takes two arguments, meaning that we need to
name them in our anonymous function.
Kotlin’s powerful type inference also makes the example much cleaner.
Kotlin can determine that
map returns a list of
Int values, which can then
be manipulated in
reduce using the sum operator.
At this point it may be hard to reason about why the second example is cleaner,
but let’s observe a few things.
First, it more clearly breaks our algorithm down into clear steps.
We start with the list of integers from 0 to 7 (
0 until 8).
Next we compute two to the power of each of these integers using
Finally, we sum all the values together using
Speaking of those helper functions, we can utilize one here to make this example even cleaner:
Let’s continue by examining each of our new functional collection processing tools one by one.
map is a good place to start since it is one of the easiest to understand.
A mapping function takes each member of a collection and transforms it into
It can perform any operation on each member of the collection, as long as its yields
some result for each value.
The result of a map is a new collection with the items from the original
collection replaced with the results of the mapping function.
Here’s an example where we extract the ages of a few people that we know into a list:
people above has not been modified.
Instead, mapping functions build a new list from an existing one.
So we could repeat the operation to extract the names of the people that we
This is a very common use for
map when processing data:
In Kotlin the mapping function receives a single argument.
We can utilize the implicit
it, or we can also name that argument if we want:
Note that the mapping function, like another anonymous functions, does not use
return to return a value.
The last expression of the mapping function will be used to build a new
collection from the original data.
return statement in a mapping function will return from the enclosing
This is usually not what you want:
There is a way to specify that you want to return from the mapping function
using the qualified return
return@map, but it may be cleaner to rewrite your
code to avoid this if possible:
map is normally used to modify the values in a collection.
When applied to a list, it maintains the order of the original list:
However, we can also
map on a map either using
mapValues to transform the
mapKeys to transform the keys.
Note that both
mapKeys receive a single argument that
corresponds to an entry in the map, not to a value or a key.
If we want to work with the value or key directly, we can also use destructing
assignment to extract these properties from
reduce is probably the hardest of the map-reduce-filter functions to understand.
filter examine individual values one at a time—whereas
reduce combines values together, and therefor transmits state between each call.
Let’s try to make use of
reduce more clear using some examples.
First, let’s identify the weakness of
Say we want to sum all of the values in a list.
We could do this with a map, but it requires declaring and modifying a
variable outside of the processing pipeline:
This works… but it requires a mutable variable and state outside our
processing pipeline—both of which we’re like to avoid.
Also consider what the result of the call to
While it’s nice to see the intermediate results, what we would like is a higher-order function allowing us to reduce a collection to a single value.
Unsurprisingly, that function is called
map, the reducing function receives two arguments.
The second is an element of the collection—but the first is the result
produced by the previous calls to reduce.
Each result of the function passed to
reduce is then passed as the first
argument to the next call to the reducing function.
Let’s look at an example to try and make this more clear:
There are a few things to note here.
First, 9 pairs of results are shown, even though the list has 10 elements.
This is because the first time that reduce is run its first two arguments are
the first two elements in the list.
This result (0) is then passed to the next call to
reduce, which displays
Since the result (0) never changes, we see it printed repeatedly, even as each
subsequent call to
reduce has the opportunity to incorporate a new value in
With this as our starting point, let’s look at how to implement array sum using
Now the output is more interesting.
We can see how the sum is being built one value at a time.
First we start with the sum of the first two values in the array: 0 and 1.
That result (1) is passed to the next call to
reduce, which adds 2, producing
That result (3) is passed to the next call to
reduce, which adds 3, producing
And so on.
We can use a similar approach to reducing to the product:
You might wonder what happens if you call
reduce on a collection with a single element.
That makes a certain amount of sense.
What about on an empty list?
(Note that we need to add the
Int type parameter to our list so that Kotlin
can determine the type of the arguments to
It’s important to note one important limitation of
Because it is called the first time on the first pair of values in the
reduce can only reduce to a value that is the same type as the
This is because the first time that
reduce is called
result is passed the
first value in the collection, which must be the same type as the values in the
This is an important limitation. For example, imagine that we want to combine all the numbers in our list into a string.
We could work around this using casting, but that’s not the right approach.
Instead, we can use a variant of
fold allows us to determine what is passed as the result to
the first call to our folding function.
Here’s how to do the sum from above using
fold instead of
The result is the same, but observe the differences.
Our folding function was called 10 times, whereas our reducing function was
The first call to
fold receives the value passed to
fold as its first
argument and the first item from the collection as its second argument.
After that point
fold behaves identically to
But how the process is bootstrapped has important effects on what we can do with
For example, here’s how to combine all of the numbers into a string:
Passing the empty string "" to
fold establishes the type of
String, and also that we must return something of type
Which we do easily using string interpolation.
This extra capability of
fold turns out to be pretty useful in certain
For the sake of completeness, folding an empty list turns out better than reducing an empty list:
The last commonly-used pipeline operator is
Filtering is simple.
If the filter function returns
true, the item is retained in the collection.
If it returns
false, the item is removed.
Let’s see this in practice:
Working with data using map-reduce-filter pipelines becomes second-nature quite
It helps that Kotlin includes a bunch of useful helper functions for performing
common operations—even if they could be done using some combination of
We’ll show both the helpers and an idea of how to implement them—usually
max work out of the box:
We’ve already seen how to implement
distinct eliminates duplicate items from a collection.
Again, we can reimplement this using
sorted sorts the items in the collection.
Without an argument the natural sort order of the elements is used:
If we want to adjust the sort order we can use
sortedBy and provide a
function that should return a sort key for each item.
reversed will reverse the order of a collection, providing a superior
reverse sort solution to the one shown above:
groupBy is a fun and powerful one.
Given a list, it will group the elements by a value into a map.
Duplicate keys will be added to a list, meaning that no items are lost.
For example, let’s say that we wanted to group all people by age:
Now let’s do some processing and see what happens when we have some duplicate ages:
Finally, it’s frequently useful to be able to reduce a collection to a
Kotlin makes this easy and maybe even joyful:
The examples above just scratch the surface of what Kotlin is capable of when working with collections. Many common operations are supported out of the box, and it’s rare that you find yourself stuck and needing to actually use a loop and mutable state.
When you’re getting started with map-reduce-filter data processing, it can be fun to try and rewrite your existing data processing pipelines that probably use loops in this functional style. Give it a shot and see how you go.