Hello,
Have problem to define response of serial device because trailer values are also in the stream. Response from the device have very rigid and stabile structure like this:
<Param>Header</Param> – known static value, fixed length
<Param>Param 1</Param> - length – next param
<Param>Param 2</Param> - known static value – fixed length
<Param>Param3</Param> - length – next param
<Param>Trailer</Param>– fixed length, known static value but also can occur in Param 1.
Key for recognizing the end of the message is that after last known parameter which is fixed in length , firs occurrence of trailer value in stream is real trailer. For example trailer can't occur in Param3, only can be in Param 1.
If define Trailer in response like normal fix parameter (<Type>fixed</Type> not <Type>trailer</Type> ), communication and protocol driver work fine because response is get after time out. Equipment is pooling every 5s, so 1,5s for time out is enough. Param1 and Param3 are handled in QA.
Is it allowed to use "time out timer" in such way?
Is there better way to solve this problem?
Thank you in advance!
Hi, the best option in this case is to rely on the length parameter. More info can be found in this help section.
If the length param has a fixed size then see the item "Responses with length field".
If it has a variable size, as seems to be indicated in the question, then see "Responses with dynamically defined length". The trailer is then set to the "known static value" of param 2, the length param 1 should be "numeric text" (ie a raw byte value that contains the length will not work), param 3 should have LengthType "other param" and refer to param 1, and minimum DataMiner version 10.0.3 is required. Don't forget to add the read response and length response actions. I don't know if it will work with the original trailer that enters after the content if that would be replaced then with a fixed parameter.
If the length param has a variable length and is not numeric text (or it doesn't work because of the extra byte(s) after the data) or you can't upgrade to at least DataMiner 10.0.3 then I'm afraid you'll have to wait until timeout time. Downsides of waiting until timeout time is that communication will be slower as you'll need to wait and the TCP sockets will be opened and closed with every command instead of keeping the connection open
Hi, waiting until the timeout time of a few seconds will not generate an RTE. The only problem is that it’s waiting until the defined time before continuing so if the timeout is defined to 5s and the response is already present after 1.5s then it’s waiting an extra 3.5s so it slows down communication. If you need to execute 3 sets (with 3 reads to verify) then that will take 5s timeout * 6 commands = 30s while it could have been 1.5s * 6 commands = 9s. The other problem is that no length verification is done, if the device indicates a length of “500” and you only received 50 bytes then the element won’t execute a retry nor will it go into timeout. It will do the retry when the last fixed parameter doesn’t match (the previous defined trailer) but there won’t be the length verification
Hi Laurens,
Understand possibility of problem.
This device in response send short textual messages (max 200 strings-bytes) and there is no information in message about its length or length of its part. For our happiness there is no need for fast response nor exchanging of information. We use those driver more then 4 years without noticeable problems, but few weeks ago we got RTE error in DMA which refer to this driver. I checked driver with DIS, correct all errors except this one (in DIS stay type minor), which I don’t know how to solve on simple, constructive way.
–
Thank you for your time and constructive discussion!
–
Best regards,
Hi, DIS is generating a minor to point to the remark that is made here that waiting for timeout will slow down communication. Debugging a driver remotely is not that easy. You mention that it was running fine for years and it suddenly started a few weeks ago. Key here is to try to figure out what changed at that time that now triggers this problem: new elements created? Driver change? Config changed? Firmware update on the device? That could give you some hints.
There can be a lot of root causes for a driver to be stuck and remotely saying what the cause is is not simple as it also requires the applicable debugging knowledge and experience. Without advanced analysis experience, as a basic step I would say to try to isolate the problem: get rid of the common shared items: verify if the problem happens when there is only one element of that driver active, make sure there are no other elements polling the same IP/Port. If the problem is gone then there is something in common, if not then perhaps the element can be moved to a staging test DMA to reproduce the problem and further investigate there by modifying the driver (eg disabling some stuff)
Hi Laurens,
It seems that restarting all elements which use this serial driver and after that restarting DMA which report RTE error with this driver solved problem.
–
Maybe new serial driver in which is corrected all errors and warning also help but I can’t be sure because didn’t restart DMA immediately after restarting all elements with old driver.
First when I stopped all elements which use that driver, RTE error stayed. After elements were start again error also stayed. Then replaced old buggy driver with new clean driver but error stay.
Then restarted DMA which report error. Error gone from DMA, but stayed in Cube. Then restarted Cube and error disappeared -hope for next 4 years.
–
Yes, I tried this problematic driver on our test DMA and it works fine. There was no firmware or software changes on those equipment’s. We have thousands of elements in DMAs and there is
always possibility that something goes wrong. I am young in DMA programming but hope that will grow up. Thank you for nice support!
Dear Laurens,
Thank you for response. I was also investigated usage of length parameter and play with it, but with no results. I appreciate your notice on parameter formats – will try again when upgrade DMAs. Now all DMAs is below 10.0.3.
I am glad to hear that can use time out, because scared that can cause RTE errors.
Have nice day!