第6章並行処理の機能強化 : 問題 5 : ConcurrentHashMap の merge メソッド

問題

複数スレッドが複数のファイルから全ての単語を読み込むアプリケーションを書け

各々の単語がどのファイルで使用されていたかを管理するためにConcurrentHashMap<String,Set<File>>とmergeメソッドを使用せよ

準備

テキストファイルから単語を順次読み込む Reader を定義する

■ WordReaderクラス

public class WordReader implements AutoCloseable {
    /** テキストファイル読み込み */
    private final BufferedReader reader;
    /** 1行分の単語 */
    private List<String> words = Collections.emptyList();

    /** コンストラクタ(文字コード UTF-8) */
    public WordReader(File file) throws IOException {
        this(file, StandardCharsets.UTF_8);
    }
    /** コンストラクタ */
    public WordReader(File file, Charset charset) throws IOException {
        reader = new BufferedReader(
            new InputStreamReader(new FileInputStream(file), charset)
        );
    }
    /** テキストファイルを閉じる */
    public void close() throws IOException {
        reader.close();
    }
    /**
     * 単語の読み込み
     * ファイルの終わりでは null を返す
     */
    public String getWord() throws IOException {
        while (words.isEmpty()) {
            String line = reader.readLine();
            if (line == null) {
                return null;
            }
            words = new LinkedList<String>(Arrays.asList(line.split("[\\P{L}]+")));
                // Arrays.asList の戻り値は remove をサポートしていないため
        }
        return words.remove(0);
    }
}

カレントフォルダのテキストファイルの単語の出現回数をシングルスレッドで数えてみる
カレントには5つのテキストファイルがあり、総バイト数は 7.7メガ

■ シングルスレッド版

Map<String, Set<File>> map = new HashMap<String, Set<File>>();
for (File file : new File(".").listFiles(f -> f.getName().endsWith(".txt"))) {
    try (WordReader reader = new WordReader(file)) {
        String word;
        while ((word = reader.getWord()) != null) {
            // getOrDefault メソッドは微妙に使えないことに注意
            Set<File> set;
            if (map.containsKey(word)) {
                set = map.get(word);
            } else {
                set = new HashSet<File>();
                map.put(word, set);
            }
            set.add(file);
        }
    }
}

解答

じゃConcurrentHashMapのmergeを使ってみよう
Value であるSet<File>の競合は考慮しなくてもいいのはありがたい

ConcurrentHashMap<String, Set<File>> map = new ConcurrentHashMap<>();
CountDownLatch latch = new CountDownLatch(5/* ちょっとダサいけど… */);
ExecutorService pool = Executors.newCachedThreadPool();
for (File file : new File(".").listFiles(f -> f.getName().endsWith(".txt"))) {
    pool.submit(
        () -> {
            try {
                try (WordReader reader = new WordReader(file)) {
                    String word;
                    while ((word = reader.getWord()) != null) {
                        Set<File> set = new HashSet<File>();
                        set.add(file);
                        map.merge(
                            word,
                            set,
                            (existing, _new) -> {
                                existing.addAll(_new); return existing;
                            }
                        );
                    }
                }
            } catch (IOException ex) {
                ex.printStackTrace();
            }
            latch.countDown();
        }
    );
}
pool.shutdown();
latch.await();

毎回、1個だけのSet<File>を生成しているにもかかわらずConcurrentHashMapの方が3倍ぐらい速い

恐るべしConcurrentHashMapクラス、恐るべしmergeメソッド

CUBは子供の白熊

Java SE 8 実践プログラミングの練習問題を解く

第6章並行処理の機能強化 : 問題 5 : ConcurrentHashMap の merge メソッド

問題

準備

解答