Blog
Smart and Fast Searching in React
April 26, 2024
51 Views
Search and filtering functionality are core to almost every application we build. It helps our users get straight into the information they are looking for.
Searching can be the responsibility of the backend or the frontend. In this article we will explore search strategies on the client starting with a very basic implementation, to better strategies, ending up with an inverted-index.
Spoiler: We will also take a look at React 18's
startTransition
. An optimization tool which you could start using more!
Let's begin with the interface of what we will be building.
The Search Hook
The API of the initial hook we will be building is very simple:
TS
declare function useSearch<T>(
items: T[],
textExtractor: (item: T) => string[],
search?: string,
): { results: T[] };
declare function useSearch<T>(
items: T[],
textExtractor: (item: T) => string[],
search?: string,
): { results: T[] };
A hook that takes the following arguments:
items
: The list of items to search throughtextExtractor
: A function that extracts the text content to search through from each itemsearch
: The search query,
and returns the search results
.
A Basic Implementation
Now we can implement such a hook as follows:
useSearch.ts
TS
1import { useMemo } from "react";
2
3export function useSearch<T>(
4 items: T[],
5 textExtractor: (item: T) => string[],
6 search?: string,
7) {
8 const texts = useMemo(() => items.map(textExtractor), [items, textExtractor]);
9
10 const results = useMemo(() => {
11 if (!search) {
12 return [];
13 }
14
15 return items.filter((_, i) =>
16 texts[i].some((text) => text.includes(search)),
17 );
18 }, [items, texts, search]);
19
20 return { results };
21}
22
1import { useMemo } from "react";
2
3export function useSearch<T>(
4 items: T[],
5 textExtractor: (item: T) => string[],
6 search?: string,
7) {
8 const texts = useMemo(() => items.map(textExtractor), [items, textExtractor]);
9
10 const results = useMemo(() => {
11 if (!search) {
12 return [];
13 }
14
15 return items.filter((_, i) =>
16 texts[i].some((text) => text.includes(search)),
17 );
18 }, [items, texts, search]);
19
20 return { results };
21}
22
Pretty simple! We are iterating through each of items
and then filtering through those who's texts contain the search
term.
But not so fast! We have several problems here. Can you spot them? They are the following:
-
Problem 1: Since we are using
useMemo
, the user could experience a lag in interactions.Why is there a lag in interactions? When the search term changes, React needs to re-render the results before it can respond to interactions.
-
Problem 2: we are looping through every single item to find the results from the search which can be inefficient.
-
Problem 3: Our search matching does not give the best results. The search has to be fully contained by the item. It cannot support "fuzzy" matches.
Now, let's dive deeper into each of these issues starting with the matching problem.
Better matching with Query Tokenization
If we have an item with a title as: "The sky is blue"
, searching for "blue sky"
would yield no results! To have the word the document still match, we can "tokenize" the search
into words, and then filter for items containing all tokens in the search.
First our tokenizer, which will split on spaces:
tokenizers.ts
TS
export function tokenizeQuery(search: string) {
return search.toLowerCase().split(/\s/).filter(Boolean);
}
export function tokenizeQuery(search: string) {
return search.toLowerCase().split(/\s/).filter(Boolean);
}
And then, our updated search hook:
useSearch.ts
TS
1import { useMemo } from "react";
2import { tokenizeQuery } from "./tokenizers";
3
4export function useSearch<T>(
5 items: T[],
6 textExtractor: (item: T) => string[],
7 search?: string,
8) {
9 const texts = useMemo(() => items.map(textExtractor), [items, textExtractor]);
10
11 const results = useMemo(() => {
12 if (!search) {
13 return [];
14 }
15
16 const terms = tokenizeQuery(search);
17
18 return items.filter((_, i) =>
19 texts[i].some((text) => terms.every((term) => text.includes(term))),
20 );
21 }, [items, texts, search]);
22
23 return { results };
24}
25
1import { useMemo } from "react";
2import { tokenizeQuery } from "./tokenizers";
3
4export function useSearch<T>(
5 items: T[],
6 textExtractor: (item: T) => string[],
7 search?: string,
8) {
9 const texts = useMemo(() => items.map(textExtractor), [items, textExtractor]);
10
11 const results = useMemo(() => {
12 if (!search) {
13 return [];
14 }
15
16 const terms = tokenizeQuery(search);
17
18 return items.filter((_, i) =>
19 texts[i].some((text) => terms.every((term) => text.includes(term))),
20 );
21 }, [items, texts, search]);
22
23 return { results };
24}
25
Perfect, now we get more relevant results. But we can take the matching a step further with regular expressions.
Matching with Regular Expressions
Instead of the steps we take to check that each search term is present in the text, we can introduce some regular expressions!
These regular expression builders will generate a regular expression from a list of strings then check that these are present in a text.
regexps.ts
TS
export function unorderedIncludesRegExp(terms: string[]) {
return new RegExp(terms.reduce((s, term) => `${s}(?=.*${term})`, ""));
}
export function orderedIncludesRegExp(terms: string[]) {
return new RegExp(terms.reduce((s, term) => `${s}.*${term}`, ""));
}
export function unorderedIncludesRegExp(terms: string[]) {
return new RegExp(terms.reduce((s, term) => `${s}(?=.*${term})`, ""));
}
export function orderedIncludesRegExp(terms: string[]) {
return new RegExp(terms.reduce((s, term) => `${s}.*${term}`, ""));
}
We will only be using
unorderedIncludesRegExp
, but I have included theorderedIncludesRegExp
to illustrate the difference and provide an alternative for interested readers. Ordered uses "positive lookaheads" to allow matching in any order.
Finally, with these additions we have:
useSearch.ts
TS
1import { useMemo } from "react";
2import { tokenizeQuery } from "./tokenizers";
3import { unorderedIncludesRegExp } from "./regexps";
4
5export function useSearch<T>(
6 items: T[],
7 textExtractor: (item: T) => string[],
8 search?: string,
9) {
10 const texts = useMemo(() => items.map(textExtractor), [items, textExtractor]);
11
12 const results = useMemo(() => {
13 if (!search) {
14 return [];
15 }
16
17 const terms = tokenizeQuery(search);
18 const searchRegExp = unorderedIncludesRegExp(terms);
19
20 return items.filter((_, i) =>
21 texts[i].some((text) => searchRegExp.test(text)),
22 );
23 }, [items, texts, search]);
24
25 return { results };
26}
27
1import { useMemo } from "react";
2import { tokenizeQuery } from "./tokenizers";
3import { unorderedIncludesRegExp } from "./regexps";
4
5export function useSearch<T>(
6 items: T[],
7 textExtractor: (item: T) => string[],
8 search?: string,
9) {
10 const texts = useMemo(() => items.map(textExtractor), [items, textExtractor]);
11
12 const results = useMemo(() => {
13 if (!search) {
14 return [];
15 }
16
17 const terms = tokenizeQuery(search);
18 const searchRegExp = unorderedIncludesRegExp(terms);
19
20 return items.filter((_, i) =>
21 texts[i].some((text) => searchRegExp.test(text)),
22 );
23 }, [items, texts, search]);
24
25 return { results };
26}
27
Maintaining Interactivity with setState
+ startTransition
According to React docs, startTransition
is a utility function that allows for marking state updates as non-blocking transitions.
It allows for state changes to occur without blocking user interactivity.
We can use it in our implementation as follows:
TS
1import { useCallback, useEffect, useState, useTransition } from "react";
2
3export function useSearch<T>(items: T[], textExtractor: (item: T) => string[]) {
4 const [results, setResults] = useState(items);
5 const [isPending, startTransition] = useTransition();
6
7 const onSearch = useCallback((search: string) => {
8 if (!search) {
9 startTransition(() => setResults([]));
10 return;
11 }
12
13 const terms = tokenizeQuery(search);
14 const searchRegExp = unorderedIncludesRegExp(terms);
15
16 const searchResults = items.filter((_, i) =>
17 texts[i].some((text) => searchRegExp.test(text)),
18 );
19
20 startTransition(() => setResults(searchResults));
21 }, []);
22
23 return { results, isPending, onSearch };
24}
25
1import { useCallback, useEffect, useState, useTransition } from "react";
2
3export function useSearch<T>(items: T[], textExtractor: (item: T) => string[]) {
4 const [results, setResults] = useState(items);
5 const [isPending, startTransition] = useTransition();
6
7 const onSearch = useCallback((search: string) => {
8 if (!search) {
9 startTransition(() => setResults([]));
10 return;
11 }
12
13 const terms = tokenizeQuery(search);
14 const searchRegExp = unorderedIncludesRegExp(terms);
15
16 const searchResults = items.filter((_, i) =>
17 texts[i].some((text) => searchRegExp.test(text)),
18 );
19
20 startTransition(() => setResults(searchResults));
21 }, []);
22
23 return { results, isPending, onSearch };
24}
25
Following this change, React will now allow user interactions to interrupt the rendering of the search results.
Try out this demo to see the results:
Demo - useSearch (w/ startTransition)
NB: we are intentionally slowing down the render of each "todo" by about 5ms to simulate a heavy render. There are 80 items in total to search from.
Inverted-Index Searching
Another approach we can take to improve search is to use an inverted-index.
What is an inverted index?
An inverted index is a data structure that maps content to its location in a document or list of documents. It helps to quickly and efficiently search through a wide range of documents to find ones that match some query.
Consider the following list of documents:
TS
const documents = [
"The sky is blue.",
"The sun is shining.",
"The sun in the sky is bright.",
];
const documents = [
"The sky is blue.",
"The sun is shining.",
"The sun in the sky is bright.",
];
To search for the term "bright"
from the documents, we would traditionally iterate through each document and check if the term is present in the document:
TS
const results = documents.filter((doc) => doc.includes("bright")); // ["The sun in the sky is bright."]
const results = documents.filter((doc) => doc.includes("bright")); // ["The sun in the sky is bright."]
Time Complexity: In this approach, we loop through each document and so our time complexity is
O(N)
, whereN
is the number of documents.
Here come inverted indexes to make things better! Assume we have built an inverted index for our documents:
TS
const index = {
the: [0, 1, 2],
sky: [0, 2],
is: [0, 1, 2],
blue: [0],
sun: [1, 2],
shining: [1],
in: [2],
bright: [2],
};
const index = {
the: [0, 1, 2],
sky: [0, 2],
is: [0, 1, 2],
blue: [0],
sun: [1, 2],
shining: [1],
in: [2],
bright: [2],
};
Now to search for the term "bright"
, with our index, we find the location of the term from the index and then retrieve the documents for the location:
TS
const idxs = index["bright"]; // [2]
const results = idxs.map((idx) => documents[idx]); // ["The sun in the sky is bright."]
const idxs = index["bright"]; // [2]
const results = idxs.map((idx) => documents[idx]); // ["The sun in the sky is bright."]
Time Complexity: The time complexity here is O(1)
for finding the matching document indexes and then O(K)
for retrieving the documents, where K
is the number of documents that contain the term.
Since K <= N
, we cut down the number of documents we need to go through!
NB: Building the actual inverted-index requires looping through all the documents, but it is a one-time cost.
Space Complexity: Our approach has increased the space complexity has increased from O(N)
to O(M * N)
, where M
is the number of tokens in a document.
Enough theory. Now let's bring the power of the inverted index into our hook. To do this, we first need a strategy to "tokenize" our documents.
Adding Document Tokenizers
Our document tokenizer will work similarly to tokenizeQuery
from before, except that it will now return all the "tokens" in our document. These tokens will be the strings from our document which we want to keep an index for.
NB: will also be updating the regular expression for splitting to capture punctuation, and use
parts()
helper so we can keep indexes for partially completed searches.
Let's define our "todo" tokenizer:
tokenizers.ts
TS
1import { Todo } from "./types";
2
3// ...
4
5export function tokenizeText(text: string) {
6 return text
7 .toLowerCase()
8 .split(/\s+|(\W+)/) // Capture punctuation with (\W+)
9 .filter(Boolean);
10}
11
12export function tokenizeTodo(doc: Todo) {
13 const tokens = tokenizeText(doc.title);
14 return tokens.flatMap(parts); // Partial tokens
15}
16
17export function parts(text: string) {
18 if (text.length === 0) {
19 return [];
20 }
21
22 const result: string[] = [];
23
24 for (let i = 0; i < text.length; i++) {
25 for (let j = i + 1; j <= text.length; j++) {
26 result.push(text.slice(i, j));
27 }
28 }
29 return result;
30}
31
1import { Todo } from "./types";
2
3// ...
4
5export function tokenizeText(text: string) {
6 return text
7 .toLowerCase()
8 .split(/\s+|(\W+)/) // Capture punctuation with (\W+)
9 .filter(Boolean);
10}
11
12export function tokenizeTodo(doc: Todo) {
13 const tokens = tokenizeText(doc.title);
14 return tokens.flatMap(parts); // Partial tokens
15}
16
17export function parts(text: string) {
18 if (text.length === 0) {
19 return [];
20 }
21
22 const result: string[] = [];
23
24 for (let i = 0; i < text.length; i++) {
25 for (let j = i + 1; j <= text.length; j++) {
26 result.push(text.slice(i, j));
27 }
28 }
29 return result;
30}
31
Caveat 1: for
tokenizeTodo
above, we can implement it without needing to chainfilter
andflatMap
operations. These can be fused into a singlereduce
, or be converted to more imperative for-loops for better performance.
Caveat 2: we will have duplicates returned by
tokenizeTodo
, but this is okay since they will be de-duplicated as the inverted idex is being built.
Building the Inverted Index
For building our index, we will be making use of a simple inverted-index library from the mnemonist
package. This package implements different data structures in JavaScript. We can install it with:
SH
npm install mnemonist
npm install mnemonist
and then import it as:
TS
import InvertedIndex from "mnemonist/inverted-index";
import InvertedIndex from "mnemonist/inverted-index";
This library allows us to build an inverted index providing a "document" tokenizer and a "query" tokenizer. It handles tokenizing the documents for us and putting them into an inverted index. Furthermore, it provides a get
method that we can use for querying items from it. Compared to before in useSearch
, we will no longer have to filter for relevant results!
Here is what building the index will look like:
inverted-index.ts
TS
1import InvertedIndex from "mnemonist/inverted-index";
2import { tokenizeQuery } from "./tokenizers";
3
4export type Tokenizer<T> = (item: T) => string[];
5
6export function buildIndex<T>(
7 items: T[],
8 documentTokenizer: Tokenizer<T>,
9 queryTokenizer: Tokenizer<string> = tokenizeQuery,
10) {
11 return InvertedIndex.from(items, [documentTokenizer, queryTokenizer]);
12}
13
14export { InvertedIndex };
15
1import InvertedIndex from "mnemonist/inverted-index";
2import { tokenizeQuery } from "./tokenizers";
3
4export type Tokenizer<T> = (item: T) => string[];
5
6export function buildIndex<T>(
7 items: T[],
8 documentTokenizer: Tokenizer<T>,
9 queryTokenizer: Tokenizer<string> = tokenizeQuery,
10) {
11 return InvertedIndex.from(items, [documentTokenizer, queryTokenizer]);
12}
13
14export { InvertedIndex };
15
Then after it is built, we can query it like this:
TS
import { buildIndex } from "./inverted-index";
const todos: Todo[] = [
{
title: "Go for lunch",
note: "Consider having jollof",
},
{
title: "Buy groceries",
note: "Don't forget the milk",
},
];
const index = buildIndex(todos, tokenizeTodos);
console.log(index.get("groceries")); // [{ title: "Buy groceries", note: "Don't forget the milk" }]
console.log(index.get("jollof")); // [{ title: "Go for lunch", note: "Consider having jollof" }]
import { buildIndex } from "./inverted-index";
const todos: Todo[] = [
{
title: "Go for lunch",
note: "Consider having jollof",
},
{
title: "Buy groceries",
note: "Don't forget the milk",
},
];
const index = buildIndex(todos, tokenizeTodos);
console.log(index.get("groceries")); // [{ title: "Buy groceries", note: "Don't forget the milk" }]
console.log(index.get("jollof")); // [{ title: "Go for lunch", note: "Consider having jollof" }]
With all the pieces in place, we can implement our final search strategy using inverted-indexing. We are going to:
- Call
buildIndex()
to build the inverted index on mount, and then - Call the
index
'sget()
method inside of ouronSearch
callback to query for documents
Also as we discussed before, we make sure to use
setState
+startTransition
to maintain interactivity performance!
useInvertedIndexSearch.ts
TS
1import { useCallback, useEffect, useState, useTransition } from "react";
2import { buildIndex, Tokenizer, InvertedIndex } from "./inverted-index";
3
4export function useInvertedIndexSearch<T>(items: T[], tokenizer: Tokenizer<T>) {
5 const [index, setIndex] = useState<InvertedIndex<T>>();
6 const [results, setResults] = useState(items);
7 const [isPending, startTransition] = useTransition();
8
9 useEffect(() => {
10 startTransition(() => {
11 setIndex(buildIndex(items, tokenizer));
12 });
13 }, [items, tokenizer]);
14
15 const onSearch = useCallback(
16 (search: string) => {
17 startTransition(() => {
18 if (!index || !search) {
19 setResults([]);
20 return;
21 }
22
23 setResults(index.get(search));
24 });
25 },
26 [index],
27 );
28
29 return { results, isPending, onSearch };
30}
31
1import { useCallback, useEffect, useState, useTransition } from "react";
2import { buildIndex, Tokenizer, InvertedIndex } from "./inverted-index";
3
4export function useInvertedIndexSearch<T>(items: T[], tokenizer: Tokenizer<T>) {
5 const [index, setIndex] = useState<InvertedIndex<T>>();
6 const [results, setResults] = useState(items);
7 const [isPending, startTransition] = useTransition();
8
9 useEffect(() => {
10 startTransition(() => {
11 setIndex(buildIndex(items, tokenizer));
12 });
13 }, [items, tokenizer]);
14
15 const onSearch = useCallback(
16 (search: string) => {
17 startTransition(() => {
18 if (!index || !search) {
19 setResults([]);
20 return;
21 }
22
23 setResults(index.get(search));
24 });
25 },
26 [index],
27 );
28
29 return { results, isPending, onSearch };
30}
31
And the last demo for this article!
Demo - useInvertedIndexSearch (w/ startTransition)
NB: we are intentionally slowing down the render of each "todo" by about 5ms to simulate a heavy render. There are 80 items in total to search from.
And that's the last of our improvements we will be discussing in detail!
Other Search Optimizations
There are several strategies we did not mention that can be used on top of the ideas we have discussed.
Debouncing
Notably, we have debouncing/throttling (read more).
Here is a quick example with useDebounceCallback
from usehooks-ts
:
useDebouncedInvertedIndexSearch.ts
TS
1import { useDebounceCallback } from "usehooks-ts";
2
3import { Tokenizer } from "./inverted-index";
4
5export function useDebouncedInvertedIndexSearch<T>(
6 items: T[],
7 tokenizer: Tokenizer<T>,
8 wait = 500,
9) {
10 const { onSearch, ...rest } = useInvertedIndexSearch(items, tokenizer);
11
12 const onSearchDebounced = useDebounceCallback(onSearch, wait);
13
14 return { results, isPending, onSearch: onSearchDebounced };
15}
16
1import { useDebounceCallback } from "usehooks-ts";
2
3import { Tokenizer } from "./inverted-index";
4
5export function useDebouncedInvertedIndexSearch<T>(
6 items: T[],
7 tokenizer: Tokenizer<T>,
8 wait = 500,
9) {
10 const { onSearch, ...rest } = useInvertedIndexSearch(items, tokenizer);
11
12 const onSearchDebounced = useDebounceCallback(onSearch, wait);
13
14 return { results, isPending, onSearch: onSearchDebounced };
15}
16
Query Caching
You may also explore caching especially when search is performed on the backend. When a user types "red car", if a search was fired at the end of "red", results may be cached such that if the user searches it again, or deletes "car", the results may show up immediately.
Caching is often automatically performed through libraries such as TanStack Query, or Redux-Toolkit Query.
Beyond React
Sometimes, you just need to go beyond React to get faster speeds in your searches. For such situations, you can delve deeper into these approaches:
Database Indexes
Similar to the inverted-index which we built in our hook, backend databases such as Postgres and MongoDB also offer options for building "full" text-search indexes to power semantic full text search across database fields and tables. This solution does stretch outside the domain of React, but is a helpful
Postgres offers a GIN (Generalized Inverted-Index) index, and MongoDB offers a Text Index for building indexes on text fields to allow them to be searched quickly.
Web Workers
If a full-on database search is not available/desirable, or if all the searchable content is already on the client, Web Workers may be helpful in performing fast/heavy searching/indexing while keeping the main thread unblocked for UI rendering.
Conclusion
In this article, we have discussed several optimizations on searching in React going through, use of regular expressions, startTransition
and inverted indexes. We also briefly discussed other optimization strategies such as caching, database indexes and web workers. I hope this has been informative and helpful!
Wan't to discuss more, leave feedback or want to dig deeper on a topic? Let me know over on X!