Hi,
I would like to know if someone could let me know what could cause a DMA to not be able to create more threads. I've looked into this article, but doesn't mention the "no more threads can me created in the system" message. Here's an example extracted from the SLDMS.txt log.
2021/06/16 02:06:12.842|SLDMS.exe|11400|CSystem::TakeBackup()|DBG|1|Local backup path is: -C:\Skyline DataMiner\Backup\-
2021/06/16 02:06:12.842|SLDMS.exe|11400|CSystem::TakeBackup|DBG|1|Current dir is -C:\Windows\system32-
2021/06/16 02:06:12.842|SLDMS.exe|11400|CSystem::TakeBackup|DBG|1|Current [before backup] dir is -C:\Windows\system32-
2021/06/16 11:27:42.316|SLDMS.exe 9.6.1829.3106|8288|11328|CRequest::Request|ERR|0|Remote Request for -DMS- on -VT_I4 : 42010- failed. No more threads can be created in the system. (hr = 0x800700A4)
Type 0/16/21
MESSAGE: No more threads can be created in the system.VALUE 1: VT_ARRAY|VT_BSTR (3) : HFC\dineshsolse;2021-06-16 11:26:51;10.176.144.18
VALUE 2: VT_ARRAY|VT_BSTR (16) : FALSE;HFC\dineshsolse;;TRUE;;dineshsolse;FALSE;TRUE;TRUE;+919773115947;dineshsolse@nbnco.com.au;...
2021/06/16 11:32:42.314|SLDMS.exe 9.6.1829.3106|8288|12832|CRequest::Request|ERR|0|Remote Request for -DMS- on -VT_I4 : 42009- failed. (hr = 0x80131500)
Type 0/16/21
MESSAGE: The request timed out.
VALUE 1: VT_ARRAY|VT_BSTR (3) : HFC\dineshsolse;2021-06-16 11:26:51;10.176.144.18
VALUE 2: VT_ARRAY|VT_BSTR (16) : FALSE;HFC\dineshsolse;;TRUE;;dineshsolse;FALSE;TRUE;TRUE;+919773115947;dineshsolse@nbnco.com.au;...
2021/06/16 12:04:31.625|SLDMS.exe 9.6.1829.3106|8288|12832|CRequest::Request|ERR|0|Remote Request for -DMS- on -VT_BSTR : 10.178.144.15- failed. No more threads can be created in the system. (hr = 0x800700A4)
Type 0/55/0
MESSAGE: No more threads can be created in the system.VALUE 1: VT_BSTR : c:\Skyline DataMiner\users\arkeedogillo\RecentItems.xml
2021/06/16 12:04:31.625|SLDMS.exe|12832|CSystem::FileCompare|ERR|-1|Filecompare (c:\Skyline DataMiner\users\arkeedogillo\RecentItems.xml) failed to receive file size from 10.178.144.15: 0x800700a4h No more threads can be created in the system.
2021/06/16 12:04:31.651|SLDMS.exe 9.6.1829.3106|8288|12832|CRequest::Request|ERR|0|Remote Request for -DMS- on -VT_BSTR : 10.178.144.15- failed. No more threads can be created in the system. (hr = 0x800700A4)
Type 0/50/0
MESSAGE: No more threads can be created in the system.VALUE 1: VT_ARRAY|VT_BSTR (2) : c:\Skyline DataMiner\users\arkeedogillo\RecentItems.xml;10.176.144.18
2021/06/16 12:04:31.682|SLDMS.exe 9.6.1829.3106|8288|12832|CRequest::Request|ERR|0|Remote Request for -DMS- on -VT_BSTR : 10.178.144.15- failed. No more threads can be created in the system. (hr = 0x800700A4)
Type 0/82/0
MESSAGE: No more threads can be created in the system.VALUE 1: VT_BSTR : c:\Skyline DataMiner\users\arkeedogillo\RecentItems.xml
2021/06/16 12:04:31.682|SLDMS.exe|12832|CSystem::Notify()|ERR|-1|Error during synchronization of c:\Skyline DataMiner\users\arkeedogillo\RecentItems.xml from 10.178.144.15. No more threads can be created in the system. (hr = 0x800700A4)
2021/06/16 12:04:31.688|SLDMS.exe|12832|CSystem::Notify|ERR|0|Synchronize file failed. - No more threads can be created in the system. (hr = 0x800700A4)
2021/06/16 12:04:32.664|SLDMS.exe 9.6.1829.3106|8288|12832|CRequest::Request|ERR|0|Remote Request for -DMS- on -VT_BSTR : 10.178.144.15- failed. No more threads can be created in the system. (hr = 0x800700A4)
Type 0/55/0
MESSAGE: No more threads can be created in the system.VALUE 1: VT_BSTR : C:\Skyline DataMiner\Users\arkeedogillo\ClientSettings.json
2021/06/16 12:04:32.664|SLDMS.exe|12832|CSystem::FileCompare|ERR|-1|Filecompare (C:\Skyline DataMiner\Users\arkeedogillo\ClientSettings.json) failed to receive file size from 10.178.144.15: 0x800700a4h No more threads can be created in the system.
2021/06/16 12:04:32.691|SLDMS.exe 9.6.1829.3106|8288|12832|CRequest::Request|ERR|0|Remote Request for -DMS- on -VT_BSTR : 10.178.144.15- failed. No more threads can be created in the system. (hr = 0x800700A4)
Type 0/50/0
MESSAGE: No more threads can be created in the system.VALUE 1: VT_ARRAY|VT_BSTR (2) : C:\Skyline DataMiner\Users\arkeedogillo\ClientSettings.json;10.176.144.18
2021/06/16 12:04:32.716|SLDMS.exe 9.6.1829.3106|8288|12832|CRequest::Request|ERR|0|Remote Request for -DMS- on -VT_BSTR : 10.178.144.15- failed. No more threads can be created in the system. (hr = 0x800700A4)
Type 0/82/0
MESSAGE: No more threads can be created in the system.
Hi,
Some of the DataMiner processes are 32-bit processes. This means that they have a memory address space limit of roughly 4GB.
If that 4GB address space limit is reached then you'll start noticing these "No more threads can be created..." log lines. That is because the process wants to create a new thread, for which new memory needs to be allocated, but it has no address space available to do this.
Most ideal in your setup every DMA is monitored by a Microsoft Platform element. That element is preferably located on another DMA, if the DMA is malfunctioning that you can still read out the values of the Microsoft Platform element. If the element is opened, then check the "Task Manager" page and in the "Task Manager" table verify the "Virtual bytes" column. I'm expecting for the SLDMS process that is around 4GB.
If that is around 4GB then it's not good and there's either a memory leak of that process (less likely) or the DMA can't handle the load and has more items to be handled than it can cope with (e.g. more messages to be sent than it can actually send and this keeps on piling up until eventually the limit is reached)
[EDIT] To try to explain it in a different way with beer.I think that the correct word for below image is "beer crate":
Suppose that this beer crate represents the memory of our process.
This crate can contain 24 bottles, so there are 24 places available = 24 addresses.
We're numbering this from 1 up to 24.
Our process also has a number of places available, as this is a 32 bit process it can have addresses with a number from 1 up to 4.000.000.000 (4GB limit) (simplified, I know it's zero based etc but let's keep it simple here).
If the process needs to do something like create a thread or store data then it's the same like you would have one or more bottles in your hand and you need to put it in the beer crate in one of the available places.
Suppose that creating a thread would be equal to placing 3 bottles in the beer crate. If there is only one place available then this is not possible because there are two places short in the beer crate to be able to do that. This is when you're getting the error message "No more threads can be created..." -> I need memory to create a thread but I don't have any space left to do what I want.
Memory is also used when sending messages, then it's adding it to a temporary queue. Suppose that from one side somebody is adding 2 bottles every 30 seconds and on the other side somebody is taking a bottle away once per minute. If the bottles are added only for 2 minutes and no more afterwards (small burst) then there are 8 bottles added and after 8 minutes these are all gone again, so this is not a problem as we had 24 spaces available in our beer crate. If it would not stop after the 2 minutes then the number of available places would keep on decreasing until after 8 minutes there are no more spaces left. That means that we can't process fast enough because of a "bottleneck" and eventually we reach the limit. Having a burst is not a problem, but if it keeps on going then you'll need to investigate where it's going wrong.
What is often thought as a solution is to increase memory (add more RAM).
Simplifying it again here (not taking paging files etc into account), but suppose that the RAM memory is the floor where our beer crate is placed on. Adding more floor (RAM) will be no solution because the beer crate (process) has a fixed size with 24 places. We only have one beer crate for this job. The floor can contain other beer crates, but these are then other processes. Similar like you would have an entire warehouse with only one beer crate on the floor and if you have a crate of 24 bottles and you want to store 26 bottles that one would say as a solution "let's expand our warehouse".
One could say then to change the size of our beer crate. This could be done by our software department and then we would have a 64 bit process instead of a 32 bit process (we would then have a larger beer crate with more spaces available).
However changing the address size of the process only makes sense if you're expecting that it occupies a certain size.
In case of the beer bottles that are added twice every 30s and removed one at a time every minute it doesn't make sense to increase the size of the beer crate as after time the ones that will take away a bottle our going to look at it and our not going to be happy because the beer is already warm and expired (e.g. you want to send a message (set a parameter) and you see it's not responding or only after several minutes). This is when we're seeing a memory leak. And as we now have a 64 bit process it means that in reality this can take up around 18446744 terabyte (=currently physically unlimited) meaning that we're going to have less floor available than our beer crate can be if all places are taken (our beer crate is only taking the floor space if it needs to contain a bottle on that space). In other words if the memory of a 64 bit process keeps on leaking then we eventually have no memory left over (no floor space available for the other beer crates), and this can affect many other things like our Windows operating system and you won't be able to open a remote desktop session to login because it needs memory for that which you will not have anymore.
Increasing RAM or changing the size of the process is not advised in this case. It would need to be looked at what that process is doing why it needs to do more than it can handle. For example if there are a lot of parameter sets, e.g. one element does a lot of remote sets on another element, or if there were a lot of alarms created/changed, or other causes (hard to say remotely what the cause can be). It could also be something else that is very slow which causes the SLDMS to pile up memory because it can't forward it fast enough.
The screenshot of the task manager was added, what could be done is looking at the trending of the "VM Size". This is not the same as the "Virtual bytes" which unfortunately has no trending enabled here, but VM Size could show a hint of the size at that time of the problem.
I’ve added the [EDIT] part to my answer with some more explanation
Laurens, first of all, what an amazing explanation, thank you so much for taking the time to look at this and explain it in such easy terms.
This gives me a good idea into in which direction I should be looking at to try to see what is going on in this DMS.
Hi Laurens,
Thank you very much for such a detailed explanation, I really appreciate it.
I have updated the original question with the task manager screenshot. So I believe SLDMS is looking good at the moment, but ideally, what would be the recommended number of threads/handles that the processes can have? Do you know if that is documented somewhere?
If in this case, the inability to create more threads was caused by the amount of messages, what could be a possible fix for this? Would it be to increase the amount of available memory?
Thanks!