sources/tech/20220207 Accumulating into lists in Java and Groovy.md
21 KiB
Accumulating into lists in Java and Groovy
This article looks at the differences between list handling in Groovy and Java. I'll explore how to run-length encode a list in both languages for that purpose.
In my last article, I reviewed some differences between creating and initializing lists in Groovy and doing the same thing in Java. I showed that Groovy has a straightforward and compact syntax for setting up lists compared to the steps necessary in Java.
This article explores some more differences between list handling in Groovy and Java. I'll explore how to run-length encode a list in both languages for that purpose. Briefly, run-length encoding is a way of compactly representing repeated sequences of the same value in a list.
You'll need to make sure you have both Groovy and Java installed on your computer to follow along.
Install Java and Groovy
Groovy is based on Java and requires a Java installation as well. Recent and decent versions of Java and Groovy might be in your Linux distribution's repositories. Or you can install Groovy following the instructions on the link mentioned above. A nice alternative for Linux users is SDKMan, which you can use to get multiple versions of Java, Groovy, and many other related tools. For this article, I'm using SDK's releases of:
- Java: version 11.0.12-open of OpenJDK 11
- Groovy: version 3.0.8
Back to the problem
Run-length encoding replaces sequences of identical elements in a list with "runs" (pairs that indicate the number of elements and element values). For example, if I have the list:
`[“a”, ”a”, ”a”, ”b”, ”c”, ”c”]`
Then when I run-length encode it, I will have a list of lists:
`[[3, ”a”], [1, ”b”], [2, ”c”]]`
This type of non-destructive compression is relatively simple to implement and reasonably efficient for lists with many groups of repeated values. An example of this type of list is the hourly temperature in places where the daily swing in temperature is limited—like, for instance, Vancouver, Canada, in the winter.
Run-length encoding in Java
I am going to look at two approaches to this problem in Java. First, the iterative approach:
1 import java.lang.*;
2 import java.util.List;
3 import java.util.ArrayList;
4 import java.util.Arrays;
5 public class Test11 {
6 static public void main([String][5] args[]) {
7 // Hourly temperature Monday 17 January 2022 in Vancouver
8 var hourlyTemp = [Arrays][6].asList(5, 5, 5, 5, 5, 4, 4, 5, 5, 5, 6, 6, 7, 7, 6, 6, 6, 6, 5, 5, 5, 5, 5, 5);
9 // Method 1: using forEach()
10 var hourlyTempRLE = new ArrayList<ArrayList<Integer>>();
11 hourlyTempRLE.
12 add(new ArrayList<Integer>([Arrays][6].asList(1,hourlyTemp.get(0))));
13 hourlyTemp.subList(1,hourlyTemp.size()).forEach(temp -> {
14 var hourlyTempRLETail = hourlyTempRLE.get(hourlyTempRLE.size() - 1);
15 if (hourlyTempRLETail.get(1) == temp) {
16 hourlyTempRLETail.
17 set(0,hourlyTempRLETail.get(0) + 1);
18 } else {
19 hourlyTempRLE.
20 add(new ArrayList<Integer>([Arrays][6].asList(1,temp)));
21 }
22 });
23 [System][7].out.println("hourlyTempRLE using iterator (method 1): " + hourlyTempRLE);
24 }
25 }
In line 8, I define hourlyTemp
as the list of hourly temperatures observed in Vancouver on 17 January 2022. I use the asList()
static method of the Java List interface, which creates an unmodifiable list (which, for this application, is OK).
In line 10, I define hourlyTempRLE
as the "list of lists" that will contain the run-length encoded temperatures. I'm using an ArrayList
of ArrayList
of Integer
here to get the list of lists structure.
In line 11, I add the initial sublist or "run" of encoding, setting its first element to 1 (the count) and its second to the first temperature in hourlyTemp
.
In lines 12 through 22, I process the remaining values of hourlyTemp
—the expression hourlyTemp.subList(1,hourlyTemp.size())
is a sublist of hourlyTemp
starting at the second element (1) and ending at the last element (hourlyTemp.size() - 1
). I use the forEach()
method of ArrayList
to iterate over the elements of that sublist, which passes each element into its Java lambda argument with the parameter temp
).
In line 13, I set hourlyTempRLETail
to the last element currently in hourlyTempRLE
.
In line 14, I check to see if the temp value passed into the lambda is the same as the value of hourlyTempRLETail
. If it is, I increment its count in lines 15 through 17. Otherwise, in lines 19 and 20, I add a new "run" whose initial value is the sublist of 1 (the count) and the current value of temp
.
This is pretty compact. Nice to be able to use var
rather than declaring the type of the variables. I don't see any more compact way of declaring hourlyTempRLE
and initializing it in the same statement.
Running this script, I obtain the following results:
$ javac Test11.java
$ java Test11
hourlyTempRLE using iterator (method 1): [[5, 5], [2, 4], [3, 5], [2, 6], [2, 7], [4, 6], [6, 5]]
$
This is what I would expect from examining the hourlyTemp
list.
One less-than-optimal aspect of the above code is that hourlyTempRLE
is mutable. This means that subsequent code that uses it must be careful not to change it. Moreover, any multithreaded or parallel execution code that refers to hourlyTempRLE
should be designed with this mutability in mind.
An easy way around this is to add a line to create an immutable copy using the copyOf()
method of the Java List interface. But that still leaves the mutable data structure hanging around.
I can use Java Streams to code this in a more functional way:
1 import java.lang.*;
2 import java.util.List;
3 import java.util.ArrayList;
4 import java.util.Arrays;
5 public class Test12 {
6 static public void main([String][5] args[]) {
7 // Hourly temperature Monday 17 January 2022 in Vancouver
8 var hourlyTemp = [Arrays][6].asList(5, 5, 5, 5, 5, 4, 4, 5, 5, 5, 6, 6, 7, 7, 6, 6, 6, 6, 5, 5, 5, 5, 5, 5);
9 // Method 2: using collect()
10 var hourlyTempRLE = hourlyTemp.stream()
11 .collect(RLE::new, RLE::accept, RLE::combine)
12 .getRLE();
13 [System][7].out.println("hourlyTempRLE using collect (method 2): " + hourlyTempRLE);
14 }
15 }
16 class RLE implements java.util.function.IntConsumer {
17 private ArrayList<ArrayList<Integer>> rle = new ArrayList<ArrayList<Integer>>();
18 public void accept(int temp) {
19 if (rle.size() > 0) {
20 var rleTail = rle.get(rle.size() - 1);
21 if (rleTail.get(1) == temp)
22 rleTail.set(0,rleTail.get(0) + 1);
23 else
24 rle.add(new ArrayList<Integer>([Arrays][6].asList(1,temp)));
25 } else {
26 rle.add(new ArrayList<Integer>([Arrays][6].asList(1,temp)));
27 }
28 }
29 public void combine(RLE other) {
30 rle.addAll(other.getRLE());
31 }
32 public ArrayList<ArrayList<Integer>> getRLE() {
33 return rle;
34 }
35 }
What’s different? A lot, it seems.
In lines 10 through 12, I have a nice compact, functional solution to defining and accumulating the elements of hourlyTemp
. In line 10, I convert hourlyTemp
to a Java Stream using the stream()
function. In line 11, I use the Streams collect()
function in conjunction with a class (RLE
) that I define below to accumulate and reduce the values. In line 12, I call a method on the RLE
class, getRLE()
, to get the list of lists. So that's nice and compact. The cost is that I have to define this helper class, RLE
, in lines 16 through 35.
The first thing I see in the helper class is the definition of the list of lists data structure rle
, on line 17. This is initialized to an empty ArrayList<ArrayList<Integer>>
when created.
Then, in lines 18 through 31, you can see the definition of two methods required by the Streams collect()
method—accept()
, used to accumulate incoming values (integers in this case) and combine()
, used to merge a separate RLE
instance into this one, which happens with parallel accumulations. The third standard method, new()
, is provided implicitly through the RLE
constructor.
The accumulation work is done in lines 18 through 28 in the accept()
method. This method has similar logic to the previous one except that I start off with an empty list which I must detect and initialize. Combining another RLE
instance with this one is handled in lines 29 through 31 using the addAll()
method provided by the ArrayList
class.
Finally, in lines 32 through 34, I define the method getRLE()
, which returns the list of lists.
Running this produces:
$ javac Test12.java
$ java Test12
hourlyTempRLE using collect (method 2): [[5, 5], [2, 4], [3, 5], [2, 6], [2, 7], [4, 6], [6, 5]]
$
This is as I would expect.
Run-length encoding in Groovy
I will look at the same two approaches to this problem in Groovy. First, the iterative approach:
1 // Hourly temperature Monday 17 January 2022 in Vancouver
2 def hourlyTemp = [5, 5, 5, 5, 5, 4, 4, 5, 5, 5, 6, 6, 7, 7, 6, 6, 6, 6, 5, 5, 5, 5, 5, 5]
3 // Method 1: using each {}
4 def hourlyTempRLE = [[1,hourlyTemp[0]]]
5 hourlyTemp[1..-1].each { temp ->
6 if (hourlyTempRLE[-1][1] == temp) {
7 hourlyTempRLE[-1][0]++
8 } else {
9 hourlyTempRLE << [1,temp]
10 }
11 }
12 println "hourlyTempRLE using iterator (method 1): $hourlyTempRLE"
In line 2, I define hourlyTemp
as the list of hourly temperatures observed on 17 January 2022 in Vancouver.
In line 4, I initialize the run-length encoded version, hourlyTempRLE
, as a list of lists with the initial value set to the first temperature in hourlyTemp
and a count of 1.
In lines 5 through 11, I process the rest of the hourlyTemp
values. The expression hourlyTemp[1..-1]
is a slice of hourlyTemp
starting at the second element (1) and ending at the last element(-1). To process that slice, I apply the Groovy List method each()
, which takes a Groovy Closure, here looking quite a bit like a Java lambda, whose parameter is temp
, and iterates over all the values in the list, calling the Closure for each value.
In lines 6 and 7, if the temperature value in the last "run" (that is, the second element of the last sublist in hourlyTempRLE
, expressed as hourlyTempRLE[-1][1]
) is the same as the value of temp
, the count is incremented.
However, if the temperature is different, then in lines 8 through 10, a new "run" gets appended to the list with the count of 1 and the value set to that of temp
.
Running this script, I observe:
$ groovy test11.groovy
hourlyTempRLE using iterator (method 1): [[5, 5], [2, 4], [3, 5], [2, 6], [2, 7], [4, 6], [6, 5]]
$
A nice alternative is to use a functional approach:
1 // Hourly temperature Monday 17 January 2022 in Vancouver
2 def hourlyTemp = [5, 5, 5, 5, 5, 4, 4, 5, 5, 5, 6, 6, 7, 7, 6, 6, 6, 6, 5, 5, 5, 5, 5, 5]
3 // Method 2: using inject {}
4 def hourlyTempRLE = hourlyTemp[1..-1].inject([[1,hourlyTemp[0]]]) { rle, temp ->
5 if (rle[-1][1] == temp) {
6 rle[-1][0]++
7 } else {
8 rle << [1,temp]
9 }
10 rle
11 }
12 println "hourlyTempRLE using inject (method 2): $hourlyTempRLE"
Here I define the same list of hourly temperatures, hourlyTemp
, as before. This functional approach uses the list reduce approach to convert hourlyTemp
to its run-length encoded equivalent. This keeps the list creation data internal to the code that iterates over hourlyTemp
rather than defining variables outside the list creation code, allowing me to declare hourlyTempRLE
as immutable should I so desire.
In Groovy, list reduce is implemented by the Groovy List inject()
method, which takes two arguments: The initial value and a Closure that embodies the reduce code.
In lines 4 through 11 above, you see that I define hourlyTempRLE
as the result of executing the inject()
with:
- An initial value of
[[1,hourlyTemp[0]]]
—that is, the initial value ofhourlyTempRLE
is a list containing one sublist whose elements are 1 (the count) and the first temperature fromhourlyTemp
(the value) - The Closure
{ rle, temp -> ... }
defined on lines 4 through 11 that has parametersrle
, the partial result of theinject()
operation so far, andtemp
, whichinject()
sets to the successive values of the slicehourlyTemp[1..-1]
In lines 5 and 6, if the value of the last "run" is the same as the value of temp
(again, the second element of the partial result so far accumulated, expressed as rle[-1][1]
) then the count is incremented.
Otherwise, in lines 7 through 9, a new "run" gets appended to the partial result so far accumulated, containing a count of 1 and the value of temp
.
In line 10, the (modified) partial result so far accumulated is returned as the value of the Closure, which inject()
uses to replace the previous partial result. In general, in Groovy methods or Closure definitions, the value of the last statement is returned as the value of the method or Closure definition—it is only necessary to use the Groovy return
statement when returning a value from midway through a method or Closure definition.
As expected, running this script produces:
$ groovy test12.groovy
hourlyTempRLE using inject (method 2): [[5, 5], [2, 4], [3, 5], [2, 6], [2, 7], [4, 6], [6, 5]]
$
To get a little more familiar with inject()
you can calculate the sum of hourlyTemp
and hourlyTempRLE
by appending the following lines to each of the two scripts:
def hourlyTempSum = hourlyTemp.inject(0) { sum, temp -> sum + temp}
def hourlyTempRLESum = hourlyTempRLE.inject(0) { sum, run ->
sum + (run[0] * run[1])
}
Groovy List defines a sum()
method that would be a more concise way of achieving the result from the first line above. Still, as this is about the most simple example of the reduce operation possible, it's instructive to present it. And in the second line, you can see that I can easily multiply the temperature value by the count of repeat occurrences to produce the same sum. Note that the initial value passed to inject()
is zero in both cases.
I can use the Groovy assert
statement to check that the sums are equal as follows:
`assert hourlyTempSum == hourlyTempRLESum`
Finally, I can uncompress the run-length encoded list by appending the following lines to the end of each of the two scripts:
def hourlyTempRLEU = hourlyTempRLE.inject([]) { uncompressed, run ->
uncompressed << [run[1]] * run[0]
}.flatten()
Here I'm making use of Groovy's operator overloading, and the Groovy List multiply()
method. "Multiplying" a list (on the left-hand side) by an integer (on the right-hand side) replicates the list. In this case, inject()
produces a list of sublists, which I want to flatten out into a list, so I use the Groovy List flatten()
method on the result of inject()
.
I can use the Groovy assert
statement to check that the uncompressed list is the same as the original:
`assert hourlyTemp == hourlyTempRLEU`
NOTE: Groovy equality is different from Java equality! In Groovy, ==
is shorthand for the .equals()
method of any object instance, and ===
is shorthand for the .is()
method of any object instance and equivalent to Java’s ==
.
A brief comparison of the Java and Groovy solutions
Compare the Java:
var hourlyTempRLE = new ArrayList<ArrayList<Integer>>();
hourlyTempRLE.
add(new ArrayList<Integer>([Arrays][6].asList(1,hourlyTemp.get(0))));
To the Groovy:
`def hourlyTempRLE = [[1,hourlyTemp[0]]]`
Here you can see how much more compact Groovy can be. And it's not just compactness for compactness' sake. The Groovy code is clearly more readable through its compactness—you can see that hourlyTempRLE
is a list of lists because of the nested brackets. This compactness is clearly visible elsewhere.
Compare the Java:
hourlyTempRLE.
get(le).
set(0,hourlyTempRLE.get(le).get(0) + 1);
To the Groovy:
`hourlyTempRLE[-1][0]++`
Here you can see the syntactical support of the Groovy language for List structures makes the statement much more readable and understandable.
Readability is really important for code maintenance because readability promotes understanding. A quick search with your favorite search engine for phrases like "reading code vs writing code" will turn up many good arguments for readability, like this one on Opensource.com. The syntactic support for Collection structures in Groovy contributes hugely to the readability of Groovy code.
Another conclusion from the above is that Groovy has really beefed up what you can do with List structures. While current versions of Groovy also support Java Streams, the additional features added to Groovy Collection structures provide much of the same functionality. In this example, using the extensions to List allows you to avoid the whole business with defining the RLE
helper class required, in some form or another, by the Java Streams approach. You're down to 12 lines of Groovy code instead of 34 lines of Java code. Maybe I've missed something in my understanding of Streams, but I don't see this getting much smaller.
This is a good moment to suggest a review of the documentation on Groovy collections to learn more.
Groovy resources
The Apache Groovy site has a lot of great documentation. Another great Groovy resource is Mr. Haki. And a really great reason to learn Groovy is to go on and learn Grails, which is a wonderfully productive full-stack web framework built on top of excellent components like Hibernate, Spring Boot, and Micronaut.
via: https://opensource.com/article/22/2/accumulating-lists-groovy-vs-java
作者:Chris Hermansen 选题:lujun9972 译者:译者ID 校对:校对者ID