J-NPCU Finding frequency of a word 3.5x faster

Osman Yasal
3 min readJun 26, 2021

--

a teeny tiny cat

Greetings folks!

As you may know that i’ve been developing a library called j-npcu on my free times.

My main objective is simply allow you to write multithreaded applications without caring the details of the parallel processing.

In this example we’ll use the same code both serial and parallel processing but the library is going to parallel the data according to our directives.

This time, i’d like to demonstrate you that how we can improve our word searching on ‘arguably large’ array of words.

If you’re a lazy person as i’m you might not want to wait to algorithm to finish This is why we’ll use j-npcu to save some time to be able to play with our cats :)

Long story short we have an array consists of 50M short hamlet quotes
here’s ex of two(“Now is the winter of our discontent.”, “A horse! a horse! my kingdom for a horse!.”) so we might thing these array as a book and we’re curious about if the line contains a word or not. let’s see how we can address this problem with. But first i’d like to show you the results!

results

when we solve this problem both serial and parallel algorithms, we get the results shown above and the speedup would be 789 / 226 = 3.4911 times faster! so, let’s see how we achieve this.

We’ll implement a generic example.java class and fill the runSerial and runParallel methods,

Here’s the main method.

I know this might be intimated to someone but it’s actually quite simple and easy to implement. Please just focus on runSerial() and runParallel() methods respectively for now.

On filling() : we fill a list that consists of quotes and set a random word to look for in it.

In runSerial() : we traverse the list and check each line if it contains the word that we’re looking for. if so then increment the result by 1.

In runParallel() : we divide the list by range value and set each part of these values to a node. Because we defined division strategy as numberBased(5) in initMethods(), each node only take max 5 messages and each message is actually a range of list of namePool list.

Here’s the node detail.

all we have to do get a message from queue of the node and process it as we did in runSerial() and the parallelization process is tackled by the library itself.

ps: you might of course achieve higher speedups simply by just adjusting the divison-strategy rank and the range of sentences. if you’d like to do so, here’s the github code : https://github.com/Osmanyasal/J-NPCU-Examples.git

Thanks for reading ❤

--

--

Osman Yasal
Osman Yasal

Written by Osman Yasal

Parallel Computing | Performance tools | Digital Twins | Computer Vision | Image Processing | Deep Learning.

No responses yet