Tuesday, November 15, 2016

Java 8 : Using Filters, Maps, Streams and Foreach to apply Lambdas to Java Collections!


Bulk operations: map, foreach, filter streams

As the original change spec says, the purpose of bulk operations is to “add functionality to the Java Collections Framework for bulk operations upon data. […] Operations upon data are generally expressed as lambda functions”. This quote reveals the actual goal of lambdas and shifts the focus of Java 8 release for me: the most important feature is actually the new way of using collections – expressing operations that could be executed concurrently and lambdas are just the better tool for expressing the operations.

Internal vs External Iteration: for vs. foreach

Historically, Java collections were not capable to express internal iteration as the only way to describe iteration flow was the for (or while) loop. For describing the internal iteration we would use libraries, such as LambdaJ:
List persons = asList(new Person("Joe"), new Person("Jim"), new Person("John"));
forEach(persons).setLastName("Doe");
In the example above, we do not actually say how the last name should be set to each individual person – maybe the operation could be performed concurrently. Now we could could express the operations in a similar way with Java 8:
persons.forEach(p -> p.setLastName("Doe"))
The Java 8 forEach operation is avaiable on any list object and it accepts a function to work on every element of the list.
The internal iteration isn’t that much related to bulk operations over collections. This is rather a minor feature that gives us an idea of what will be the effect of syntax changes. What is actually interesting in regards to bulk data operations is the new Stream API. Further in this post we’ll show how to filter a list or a Java 8 stream and how to convert a stream to a regular set or a list.

Java 8 Stream API: map, filter, foreach

The new java.util.stream package has been added to JDK which allows us to perform filter/map/reduce-like operations with the collections in Java 8.
The Stream API would allow us to declare either sequential or parallel operations over the stream of data:
List<Person> persons = … 

// sequential version
Stream<Person> stream = persons.stream();
 
//parallel version 
Stream<Person> parallelStream = persons.parallelStream(); 
The java.util.stream.Stream interface serves as a gateway to the bulk data operations. After the reference to a stream instance is acquired, we can perform the interesting tasks with the collections.

Filter

Filtering a stream of data is the first natural operation that we would need. Stream interface exposes a filter method that takes in a Predicate (http://javadocs.techempower.com/jdk18/api/java/util/function/Predicate.html ) SAM that allows us to use lambda expression to define the filtering criteria:
List<Person> persons = …
Stream<Person> personsOver18 = persons.stream().filter(p -> p.getAge() > 18);
Now the thing is that often you want to filter a List just like in the example above. Then you can easily convert the list to a stream, filter and collect the results back to a list if necessary.

Map

Assume we now have a filtered data that we can use for the real operations, say transforming the objects. The map operations allows us to apply a function (http://javadocs.techempower.com/jdk18/api/java/util/function/Function.html ), that takes in a parameter of one type, and returns something else. First, let’s see how it would have been described in the good ‘ol way, using an anonymous inner class:
Stream students = persons.stream()
      .filter(p -> p.getAge() > 18)
      .map(new Function<Person, Student>() {
                  @Override
                  public Student apply(Person person) {
                     return new Student(person);
                  }
              });
Now, converting this example into a lambda syntax we get the following:
Stream map = persons.stream()
        .filter(p -> p.getAge() > 18)
        .map(person -> new Student(person));
And since the lambda that is passed to the map method just consumes the parameter without doing anything else with it, then we can transform it further to a method reference:
Stream map = persons.stream()
        .filter(p -> p.getAge() > 18)
        .map(Student::new);

Collect: converting stream to list

While stream abstraction is continuous by its nature, we can describe the operations on streams but to acquire the final results we have to collect the data somehow. The Stream API provides a number of “terminal” operations. The collect() method is one of those terminals that allows us to collect the results of the operations:
List students = persons.stream()
        .filter(p -> p.getAge() > 18)
        .map(Student::new)
        .collect(new CollectorList
>() { … });
Fortunately, in most cases you wouldn’t need to implement the Collectorinterfaces yourself. Instead, there’s a Collectors utility class for convenience:
List students = persons.stream()
        .filter(p -> p.getAge() > 18)
        .map(Student::new)
        .collect(Collectors.toList());
Or in case if we would like to use a specific collection implementation for collecting the results:
List students = persons.stream()
        .filter(p -> p.getAge() > 18)
        .map(Student::new)
        .collect(Collectors.toCollection(ArrayList::new));
In the example above you see how we collect Java 8 stream to a list. In the same way you can easily convert your stream to a set or a map using something like: Collectors.toCollection(HashSet::new)).

Parallel and Sequential Streams

One interesting feature of the new Stream API is that it doesn’t require to operations to be either parallel or sequential from beginning till the end. It is possible to start consuming the data concurrently, then switch to sequential processing and back at any point in the flow:
List students = persons.stream()
        .parallel()
        .filter(p -> p.getAge() > 18)  // filtering will be performed concurrently
        .sequential()
        .map(Student::new)
        .collect(Collectors.toCollection(ArrayList::new));
The hidden agenda here is that the concurrent part of data processing flow will manage itself automatically, (hopefully) without requiring us to deal with the concurrency issues.

No comments: