Using a Worker-Index is pretty much the same as using a standard Index
.
Worker-Index always return a
Promise
for all methods called on the index.
When adding/updating/removing large bulks of content to the index, it is recommended to use the async version of each method to prevent blocking issues on the main thread. Read more about Asynchronous Runtime Balancer
The internal worker model is distributed by document fields and will solve subtasks in parallel.
Documents will create worker automatically for each field by just apply the option worker: true
:
const index = new Document({
worker: true,
document: {
id: "id",
index: ["name", "title"],
tag: ["cat"]
}
});
index.add({
id: 1, cat: "catA", name: "Tom", title: "some"
}).add({
id: 2, cat: "catA", name: "Ben", title: "title"
}).add({
id: 3, cat: "catB", name: "Max", title: "to"
}).add({
id: 4, cat: "catB", name: "Tim", title: "index""
});
When you perform a field search through multiple fields then this task is being well-balanced through all involved workers, which can solve their subtasks independently.
When using one of the bundles from /dist/
you can create a Worker-Index:
import { Worker } from "./dist/flexsearch.bundle.module.min.js";
const index = new Worker({/* options */ });
await index.add(1, "some");
await index.add(2, "content");
await index.add(3, "to");
await index.add(4, "index");
When not using a bundle you can take the worker file from /dist/
folder as follows:
import Worker from "./dist/module/worker.js";
const index = new Worker({/* options */ });
index.add(1, "some")
.add(2, "content")
.add(3, "to")
.add(4, "index");
When loading a legacy bundle via script tag (non-modules):
const index = new FlexSearch.Worker({/* options */ });
await index.add(1, "some");
await index.add(2, "content");
await index.add(3, "to");
await index.add(4, "index");
The worker model for Node.js is based on native worker threads and works exactly the same way:
const { Document } = require("flexsearch");
const index = new Document({
worker: true,
document: {
id: "id",
index: ["name", "title"],
tag: ["cat"]
}
});
Or create a single worker instance for a non-document index:
const { Worker } = require("flexsearch");
const index = new Worker({ options });
Worker-Index Options extends the default Index Options, you can apply also.
Option | Values | Description |
config | String | Either the absolute URL to the config file when used in Browser context (should match the Same-Origin-Policy) or the filepath to the configuration file when used in Node.js context |
export | function | The export handler function. Read more about Export |
import | function | The export handler function. Read more about Import |
When using Worker by also assign custom functions to the options e.g.:
- Custom Encoder
- Custom Encoder methods (normalize, prepare, finalize)
- Custom Score (function)
- Custom Filter (function)
- Custom Fields (function)
... then you'll need to move your field configuration into a file which exports the configuration as a default
export. The field configuration is not the whole Document-Descriptor.
When not using custom functions in combination with Worker you can skip this part.
Since every field resolves into a dedicated Worker, also every field which includes custom functions should have their own configuration file accordingly.
Let's take this document descriptor:
{
document: {
index: [{
// this is the field configuration
// ---->
field: "custom_field",
custom: function(data){
return "custom field content";
}
// <------
}]
}
};
The configuration which needs to be available as a default export is:
{
field: "custom_field",
custom: function(data){
return "custom field content";
}
};
You're welcome to make some suggestions how to improve the handling of extern configuration.
An extern configuration for one WorkerIndex, let's assume it is located in ./custom_field.js
:
const { Charset } = require("flexsearch");
const { LatinSimple } = Charset;
// it requires a default export:
module.exports = {
encoder: LatinSimple,
tokenize: "forward",
// custom function:
custom: function(data){
return "custom field content";
}
};
Create Worker Index along the configuration above:
const { Document } = require("flexsearch");
const flexsearch = new Document({
worker: true,
document: {
index: [{
// the field name needs to be set here
field: "custom_field",
// path to your config from above:
config: "./custom_field.js",
}]
}
});
An extern configuration for one WorkerIndex, let's assume it is located in ./custom_field.js
:
import { Charset } from "./dist/flexsearch.bundle.module.min.js";
const { LatinSimple } = Charset;
// it requires a default export:
export default {
encoder: LatinSimple,
tokenize: "forward",
// custom function:
custom: function(data){
return "custom field content";
}
};
Create Worker Index with the configuration above:
import { Document } from "./dist/flexsearch.bundle.module.min.js";
// you will need to await for the response!
const flexsearch = await new Document({
worker: true,
document: {
index: [{
// the field name needs to be set here
field: "custom_field",
// Absolute URL to your config from above:
config: "http://localhost/custom_field.js"
}]
}
});
Here it needs the absolute URL, because the WorkerIndex context is from type Blob
and you can't use relative URLs starting from this context.
As a test the whole IMDB data collection was indexed, containing of:
JSON Documents: 9,273,132
Fields: 83,458,188
Tokens: 128,898,832
The used index configuration has 2 fields (using bidirectional context of depth: 1
), 1 custom field, 2 tags and a full datastore of all input json documents.
A non-Worker Document index requires 181 seconds to index all contents.
The Worker index just takes 32 seconds to index them all, by processing every field and tag in parallel. For such large content it is a quite impressive result.
Worker will save/load their data dedicated and does not need the message channel for the data transfer.
This feature follows the strategy of using Extern Worker Configuration in combination with Basic Export Import.
Example (CommonJS): basic-worker-export-import
Example (ESM): basic-worker-export-import
Provide the index configuration and keep it, because it isn't stored. Provide a parameter config
which is including the filepath to the extern configuration file:
const dirname = import.meta.dirname;
const config = {
tokenize: "forward",
config: dirname + "/config.js"
};
Any changes you made to the configuration will almost require a full re-index.
Provide the extern configuration file e.g. /config.js
as a default export including the methods export
and import
:
import { promises as fs } from "fs";
export default {
tokenize: "forward",
export: async function(key, data){
// like the usual export write files by key + data
await fs.writeFile("./export/" + key, data, "utf8");
},
import: async function(index){
// get the file contents of the export directory
let files = await fs.readdir("./export/");
files = await Promise.all(files);
// loop through the files and push their contents to the index
// by also passing the filename as the first parameter
for(let i = 0; i < files.length; i++){
const data = await fs.readFile("./export/" + files[i], "utf8");
index.import(files[i], data);
}
}
};
Create your index by assigning the configuration file from above:
import { Worker as WorkerIndex } from "flexsearch";
const index = await new WorkerIndex(config);
// add data to the index
// ...
Export the index:
await index.export();
Import the index:
// create the same type of index you have used by .export()
// along with the same configuration
const index = await new WorkerIndex(config);
await index.import();
This feature follows the strategy of using Extern Worker Configuration in combination with Document Export Import.
Document Worker exports all their feature including:
- Multi-Tag Indexes
- Context-Search Indexes
- Document-Store
Example (CommonJS): document-worker-export-import
Example (ESM): document-worker-export-import
Provide the index configuration and keep it, because it isn't stored. Provide a parameter config
which is including the filepath to the extern configuration file:
const dirname = import.meta.dirname;
const config = {
worker: true,
document: {
id: "tconst",
store: true,
index: [{
field: "primaryTitle",
config: dirname + "/config.primaryTitle.js"
},{
field: "originalTitle",
config: dirname + "/config.originalTitle.js"
}],
tag: [{
field: "startYear"
},{
field: "genres"
}]
}
};
Any changes you made to the configuration will almost require a full re-index.
Provide the extern configuration file as a default export including the methods export
and import
:
import { promises as fs } from "fs";
export default {
tokenize: "forward",
export: async function(key, data){
// like the usual export write files by key + data
await fs.writeFile("./export/" + key, data, "utf8");
},
import: async function(file){
// instead of looping you will get the filename as 2nd paramter
// just return the loaded contents as a string
return await fs.readFile("./export/" + file, "utf8");
}
};
Create your index by assigning the configuration file from above:
import { Document } from "flexsearch";
const document = await new Document(config);
// add data to the index
// ...
Export the index by providing a key-data handler:
await document.export(async function(key, data){
await fs.writeFile("./export/" + key, data, "utf8");
});
Import the index:
const files = await fs.readdir("./export/");
// create the same type of index you have used by .export()
// along with the same configuration
const document = await new Document(config);
await Promise.all(files.map(async file => {
const data = await fs.readFile("./export/" + file, "utf8");
// call import (async)
await document.import(file, data);
}));
When using worker via one of the bundled versions (e.g. flexearch.bundle.min.js
), the worker will be created by code generation under the hood. This might have issues when using strict CSP settings.
You can overcome this issue by using the non-bundled versions e.g. dist/module/
or by passing the filepath to the worker file instead of true
like worker: "dist/module/worker/worker.js"
.