Searching for a string in a large array

Sometimes you just need to search for a string in a very large list (let’s say 1 million strings or more) as you know the brute force search will yield a Big O Notation of O(n) which as the larger the dataset grows the slower it gets.

Instead we can use a map[string]interface{} to reduce that time. Let’s create a simple test to compare the brute force search with the map solution.

The code is below:

package main

import (
	"log"
	"math/rand"
	"time"
)

func init() {
	rand.Seed(time.Now().UnixNano())
}

var letterRunes = []rune("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ")

func RandStringRunes(n int) string {
	b := make([]rune, n)
	for i := range b {
		b[i] = letterRunes[rand.Intn(len(letterRunes))]
	}
	return string(b)
}

// Contains tells whether a contains x.
func Contains(a []string, x string) bool {
	for _, n := range a {
		if x == n {
			return true
		}
	}
	return false
}

func main() {
	maplist := make(map[string]bool)
	list := make([]string, 0)
	for x := 0; x < 1000000; x++ {
		str := RandStringRunes(10)
		maplist[str] = true
		list = append(list, str)
	}
	start := time.Now()

	Contains(list, list[999999])
	elapsed := time.Since(start)
	log.Println("Contains took", elapsed)

	start = time.Now()

	maplist[list[999999]] = false
	elapsed = time.Since(start)
	log.Println("Map took", elapsed)
}

Here are the results of the test above with 1 million strings:

2019/02/28 10:44:37 Contains took 3.568592ms
2019/02/28 10:44:37 Map took 379ns

And 4 million strings

2019/02/28 10:50:31 Contains took 13.945663ms
2019/02/28 10:50:31 Map took 768ns

And 8 million strings

2019/02/28 10:52:33 Contains took 27.87115ms
2019/02/28 10:52:33 Map took 1.713┬Ás

No matter how many times I ran the test the map was dramatically faster even as th4e data set increased.

As a result I designed a Type that hides the implementation of the search and generalized the interface of the data type.

Here is the MapList implementation and a use example:

type MapList map[interface{}]interface{}

func (ar MapList) Contains(needle interface{}) bool {
	if _, ok := ar[needle]; ok {
		return true
	}
	return false
}

Example use:

package main

import "fmt"

var emails = MapList{
	"me@aol.com":  nil,
	"you@aol.com": nil,
	"him@aol.com": nil,
	"her@aol.com": nil,
}

var numbers = MapList{
	1:   nil,
	23:  nil,
	60:  nil,
	105: nil,
}

func main() {
	fmt.Println(emails.Contains("you@aol.com"))
	fmt.Println(emails.Contains("you2@aol.com"))

	fmt.Println(numbers.Contains(23))
	fmt.Println(numbers.Contains(8))
}

Note you can use any type as a key to the map (data) as long as the == comparison operator works.

In the future I could see creating a library and putting this in it with more MapList functions attached to it, like ‘Add’, ‘Remove’, etc,… As well perhaps we could add a paramter to send in a Comparator function so that we could check any data type at all (even complex structs)

Happy Going!!!

comments powered by Disqus