Using DomInstanceExposers.FieldValues.DomInstanceField(fieldId).Contains(searchValue) returns unexpected results. The filter appears to perform fuzzy/tokenized matching rather than exact substring matching.
Environment:
- DataMiner 10.5.0.0-16432-CU8
Code:
var customerNameFieldId = GetFieldDescriptorId("Customer Name");
var definitionFilter = DomInstanceExposers.DomDefinitionId.Equal(domDefinition.ID.Id);
var customerNameFilter = DomInstanceExposers.FieldValues.DomInstanceField(customerNameFieldId).Contains("00000000");
var combinedFilter = new ANDFilterElement<DomInstance>(definitionFilter, customerNameFilter);
var instances = DomHelper.DomInstances.Read(combinedFilter); // Returns 2646 instances
Result:
- Search value: "00000000"
- Expected: 0 results (no CustomerName contains this string)
- Actual: 2646 results
Verified by logging actual field values - none contain "00000000":
CustomerName: 'MBC FZ LLC - 110004' - Contains '00000000': False
CustomerName: 'MBC FZ LLC - 110002' - Contains '00000000': False
CustomerName: 'MX1 Internal - MX1Internal0000' - Contains '00000000': False
...
Questions:
1. Is Contains performing tokenized matching in OpenSearch rather than exact substring matching (like SQL LIKE '%value%')?
2. Is there an alternative filter for exact substring matching at database level?
3. Is this expected behavior or a bug?
Current workaround: Post-filtering in memory with LINQ .Where(), but this impacts performance on large datasets.
Hi Alberto, I appreciate the pointer. The DOM filter API (DomHelper/DomInstanceExposers) doesn’t expose a way to enforce a raw OpenSearch term query from this layer, so there’s no straightforward path to exact term semantics via DOM. If there’s an officially supported hook to pass a term query through DOM, happy to try it but otherwise, this would require dropping below DOM to OpenSearch, which isn’t available in this context.
Hi Enis,
It is indeed correct that the 'Contains' filter is analyzed in Elasticsearch and OpenSearch with some kind of fuzzy matching. (n-gram tokenizer)
This has been the default behavior for many years with similar objects stored by DataMiner in these DBs. Possible confusing results have been noted in the past, and a task was created to further evaluate if this could be improved. (194368)
This behavior is already different on STaaS. There, a contains filter is evaluated as a strict full match, more in line with what most users would expect.
In your case, if OpenSearch will be used, the workaround will depend on the use case. The one you suggested is an option but could indeed lead to reading a lot more data than needed. If only a certain part of the string will be known, this could also be stored separately so an 'equal' filter could be used. But that will again depend on whether this is possible for the solution you are creating.
Hi Thomas, thanks for the clarification. For our use case, storing separate searchable fields isn’t feasible as the values we query with "Contains" are produced by another generic component from polled APIs and vary by use-case, so reshaping the storage would be a risky change. Is there any supported way on OpenSearch to do an exact substring match at query time to avoid large reads and in-memory filtering? If not, is there any ETA on task 194368?
There is unfortunately no way right now to execute a contains query via the DOM API that would result in the exact match behavior. There is currently no official ETA for that task. We can share the use-case with the PO that manages that backlog, however, so the priority can be re-evaluated.
Great question – won't post an answer as I haven't played much with OpenSearch yet, but based on the output above, I'd say you're right and that Contains(…)-based DOM filter behaves like a tokenized search rather than a SQL LIKE '%value%'
So 1 & 3 would seem by design:
ref 2, unsure – have you tried a "term" query?
https://docs.opensearch.org/1.1/opensearch/query-dsl/term/#term
Subscribing to get an insight from more experienced users.