Searching for a string in a large array
Mar 2, 2019 · 3 minute read · CommentsSometimes you just need to search for a string in a very large list (let’s say 1 million strings or more) as you know the brute force search will yield a Big O Notation of O(n) which as the larger the dataset grows the slower it gets.
Instead we can use a map[string]interface{} to reduce that time. Let’s create a simple test to compare the brute force search with the map solution.
The code is below:
package main
import (
"log"
"math/rand"
"time"
)
func init() {
rand.Seed(time.Now().UnixNano())
}
var letterRunes = []rune("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ")
func RandStringRunes(n int) string {
b := make([]rune, n)
for i := range b {
b[i] = letterRunes[rand.Intn(len(letterRunes))]
}
return string(b)
}
// Contains tells whether a contains x.
func Contains(a []string, x string) bool {
for _, n := range a {
if x == n {
return true
}
}
return false
}
func main() {
maplist := make(map[string]bool)
list := make([]string, 0)
for x := 0; x < 1000000; x++ {
str := RandStringRunes(10)
maplist[str] = true
list = append(list, str)
}
start := time.Now()
Contains(list, list[999999])
elapsed := time.Since(start)
log.Println("Contains took", elapsed)
start = time.Now()
maplist[list[999999]] = false
elapsed = time.Since(start)
log.Println("Map took", elapsed)
}
Here are the results of the test above with 1 million strings:
2019/02/28 10:44:37 Contains took 3.568592ms
2019/02/28 10:44:37 Map took 379ns
And 4 million strings
2019/02/28 10:50:31 Contains took 13.945663ms
2019/02/28 10:50:31 Map took 768ns
And 8 million strings
2019/02/28 10:52:33 Contains took 27.87115ms
2019/02/28 10:52:33 Map took 1.713µs
No matter how many times I ran the test the map was dramatically faster even as th4e data set increased.
As a result I designed a Type that hides the implementation of the search and generalized the interface of the data type.
Here is the MapList implementation and a use example:
type MapList map[interface{}]interface{}
func (ar MapList) Contains(needle interface{}) bool {
if _, ok := ar[needle]; ok {
return true
}
return false
}
Example use:
package main
import "fmt"
var emails = MapList{
"me@aol.com": nil,
"you@aol.com": nil,
"him@aol.com": nil,
"her@aol.com": nil,
}
var numbers = MapList{
1: nil,
23: nil,
60: nil,
105: nil,
}
func main() {
fmt.Println(emails.Contains("you@aol.com"))
fmt.Println(emails.Contains("you2@aol.com"))
fmt.Println(numbers.Contains(23))
fmt.Println(numbers.Contains(8))
}
Note you can use any type as a key to the map (data) as long as the ==
comparison operator works.
In the future I could see creating a library and putting this in it with more MapList functions attached to it, like ‘Add’, ‘Remove’, etc,… As well perhaps we could add a paramter to send in a Comparator function so that we could check any data type at all (even complex structs)
Happy Going!!!