Version: 0.3.38

Optimization#

Module:

Optimization

Git Repository:

Lab 06—Optimization

Objectives:
  • Understand how to use recursion to approach optimization problem

  • Understand how to apply dynamic programming

Given two words, say “dog” and “cat”, our job is to find the minimum number of editions that transforms the first word into the second. There are various type of edit distance, depending on what “edition” we consider. In the following, we will study the Levenshtein distance, which restricts editions to adding character, removing character, and replacing a character by another 1See for instance Skiena, S. S. (2020). The algorithm design manual. : Springer International Publishing. Chapter 10.2. Here are a few examples:

  • “dog” can be transformed to “cat” by a minimum of 3 editions, namely converting ’d’ to ’c’, ’o’ to ’a’, and ’g’ to ’t’.

  • ”sunday” can be transformed to ”monday” by a minimum of 2 editions, namely converting ’s’ to ’m’, ’u’ to ’o’.

  • ”sun” can be transformed to ”sunday” by a minimum of 3 editions, namely adding ’d’, ’a’, and ’y’.

  • “aloud” can be transformed into “allowed” in three steps: Inserting a second ’l’ between the ’l’ and the ’o’, replacing the ’u’ with a ’w’, and finally inserting an ’e’ between the ’w’ and the ’d’.

Recursion#

Let us start with a “naive” implementation of the Levenshtein distance, with three kinds of edition: Addition, deletion and replacement.

Exercise

What would be the “base cases” of the recursion?

Exercise

What would be the “recursive cases” of the recursion?

Exercise

Implement a recursive implementation of the Levenshtein distance problem.

  1. Complete the file WithRecursion.java.

  2. The test suite WithRecursionTest.java may help you get your algorithm right.

Memoization#

In this section, we will improve our recursive solution with memoization, that is avoiding solving multiple times the same sub problems.

Exercise

Instrument your recursive implementation to display the tree structure formed by the recursive calls. Can you see any recurrent sub problems, that would justify the need for memoïzation?

Exercise

Implement memoization for the Levenshtein distance.

  1. Create a class to represent a single sub problem.

  2. Override the equals and hashCode so that we can store such problem and their solution into a hash table.

  3. Starting from your recursive code, complete the file WithMemoization.java by storing solved sub problems and their solution.

  4. The test suite WithMemoizationTest.java can help you test your implementation.

Exercise

Use the Benchmark.java class to compare the speed of your recursive and memoized implementation. Check that your two solution output the same edit distance (see Appendix 4 for info about how to run the benchmark).

Dynamic Programming#

We now further improve our solution, using dynamic programming to get rid of the recursive calls and, in turn, minimize the memory we consumed.

Exercise

How would you organize the sub problems into a “table”?

  1. Where would the “base” cases be in this table?

  2. Where would the “recursive” cases be?

  3. How would you update a cell in this table?

Exercise

Implement dynamic programming for the edit distance.

  1. Complete the file WithDP.Java.

  2. The test suite WithDPTest.java can help you test your implementation.

Exercise

Use the Benchmark.java class to compare the speed of your memoized and DP implementation. Check that your two solutions output the same edit distance.

Benchmark#

The code available in the Github repository includes a simple Java class that runs the three implementation (recursive, with memoization, and with dynamic programming) against random words of increasing length (10, 50, 100, 500, and characters). If the run takes more than 20 seconds, it is considered as timed out. We can run this benchmark with the command:

1% mvn compile \
2      exec:java --quiet \
3      -Dexec.mainClass="no.ntnu.idata2302.lab06.Benchmark"