Discussion in 'CCDOPS and SBIG Universal Driver' started by Adam Robichaud, Sep 15, 2016.
Here is another TX Timeout captured.
Karen Collins (for Mark Manner)
Mark I agree with your assessment that it's much harder to duplicate the issue with this log version of software. I tried by collecting bias frames i set exposure time = 100ms (0.010) and exposure delay = 0 at 250 times and it never hung. I'm also using TSX Pro 10.5.0 Build 10229 on windows 10. Normally doing this it would hang after about the first 5 or 10 frames. I was shocked to see it complete 250 frames and not hang.
Oddly enough, my camera has not hung since I turned on logging, and I can get 1-3 hangs per night.
If you are not using a subframe, readout is probably so long, it will take forever to see the problem. I'd set a small subframe when the debug logs are active to better simulate normal operation.
Martin: The log files also zip very well. Also: we need the latest driver, sadly, since that has the extra logging capabilities.
Mark/Karen: Thanks for the logs. Unfortunately they don't contain a smoking gun, but it does help us narrow down the issue further. I'll try and get an update up tomorrow with further analysis.
Thank you. I'll try anything newly available on Tuesday evening.
FYI — we're reviewing some further enhancements to the debug logs, since our last changes didn't capture the error as we'd expected, but we want to make sure they're as comprehensive as possible so we don't keep playing this cat-and-mouse game. My hope is to have the new driver out by end of day tomorrow, so we have ample time to ensure we've covered as broad a base as possible.
Thanks again for your patience, and we're sorry this is taking so long.
So last night I turned logging off and the damn camera hung twice whereas the previous two nights with logging on, the camera did not hang but of course ran slowly, especially the AO.
Mark suggested using a sub-frame but from what I have read, and experienced, the camera hangs during focus runs for a lot of folk, so the sub-frame idea is not playing a part here while the camera is focussing.
The current operating theory is that the operating system is getting overloaded with rapid-succession calls to our USB Write function (or more specifically, the Windows API calls within the our USB Write function). We're looking at ways to prevent that, and also give more meaninful error messages when we hit an OS limit.
Yes Adam, that would certainly explain why it is not occuring with logging turned on. Sounds like you are homing in on the issue. Cant wait to see this go away, especially as I am very keen to purchase the 16200 but very hesitant right now.
keep at it mate.
If I don't have something to test Thursday evening, I will probably loose my ability to reproduce the problem for a while. I have a critical observation Friday night, so I have to do something to get rid of this problem Thursday evening (not sure what yet). Once this problem goes away, I'm not sure how to make it start happening again on demand, but it will be back sooner or later. I hope I can test something Thursday evening.
Sorry for the delay — because the changes were in such a critical section of the code, we wanted to make sure it was as stable as we could make it before distributing it. For the same reason, we won't be distributing this change via SBIG Driver Checker 64 just yet. The attached ZIP file contains an updated SBIGUDrv, and installation/runtime/reversion, and instructions on proper log collection. It will add crucial OS information about Tx Timeout modes of failure, and catch/log more modes of failure than previous versions did. Beyond that, it should not function any differently than it did in the past.
Note: Please do not download and install this driver version with the intent of using it for your normal imaging — it is a beta driver, and may contain changes that will affect your ability to image efficiently. If you want to do normal imaging, follow the reversion instructions in ReadMe.txt, or ReadMe.md.
Thanks again for your patience, and continue assistance!
The problem was even more difficult to reproduce using the above new version of the driver (v4.96 build 1), but I did capture two events. Also, with this version of the driver, the error given is not "TX Timeout", but is "unknown OS Error". I don't recall an error message presented to me by TSX for the first instance of the error near the top of the log file (but it apparently happened right when I started TSX), but for the one near the bottom of the file, TSX produced an error message "SBIG Driver: Operating system error. Error = 30032". To find the corresponding errors in the attached log file, search for "OS Error". The setup here is similar to what I described above for my last test session, except in this case when I set TSX to run continuous focus exposures, I had the exposure time set to 1.0 sec, no delay between exposures (as before), and used a subframe size of 8x8 pixels (to speed up the readout part of the cycle). I have again cut out a huge part of the file between the first occurrence and the second occurrence, which occurred ~50 minutes later. I still have the full ~44 MB file, if the attached short version is not all that you need.
With the above driver and setup, if the logs are NOT enabled, the "unknown OS Error" occurs right away (within the first minute).
If you have a version of an attempted fix that I can try tomorrow evening, it would be greatly appreciated since I haven't yet found a way to stop the problem from happening too many times within one full night of continuous exposures. When an error occurs, the TSX exposure sequence is aborted, and I get no more data until I notice the error message and restart the sequence. Even then, I still loose an exposure which is sometimes 3 minutes of data.
Karen (for Mark Manner)
I don't think the error you saw was the error we're hunting after — though if it is, then this could be an issue with sbigu64.sys — but it is the correct behavior we want. Here's an updated driver which will help us either eliminate the issue, or pin it down as a potential cause. Please run the test one more time.
Here is a log with two TX Errors captured (the different error seen in the last build is gone now). This is from version 4.96 build 2. This version seems to have much less impact on the driver performance, so the error happens more readily.
Some additional info for you. Without exception, in the past, I have always had a Tx or RX timeout error with the camera hanging and requiring a power cycle to recover. My automation software also hung (CCDAP). Now, no TX or Rx errors are generated, but CCDAP generates (most of the time) a 'Take image failed' SMS. When I terminate CCDAP through Taskmgr, it also kills The SKyX. Consequently, when I restart TSX, I can reconnect to the camera without a power cycle. From my end, what has changed is the install of the initial driver you provided plus My TSX was about 3 'daily builds' out of date. The end result remains the same, but it appears the TX or RX timeout errors are not being generated before CCDAP/TSX crashes.
That last log from Karen helped a lot — it's basically narrowed down the field of potential causes to "false positive" write requests (with the caveat that there may be more going on under the hood, but I think we're getting close to our first potential fix). I've updated some code in that section of the driver which should protect against "false positive" write requests, and also improve logging in the event that it bypasses those safeguards (which would imply the communcations are failing at the OS level). It will help us determine how best to proceed if these changes don't resolve the issue.
Not sure what those crashes are about, Martin, but if CCDAP/TSX is crashing, then it would be nice to know what SBIGUDrv is in the middle of when things start failing.
I think we're getting close. New driver attached.
I tested the build 3 driver tonight. I will first say that the primary problem went away Sunday evening as mysteriously as it showed up a week or two ago. That said, I ran two tests before observing this evening using the new build 3 driver. First, I disabled all logging and hammered the interface with exposure requests using the focus module of TSX. I used 8x8 binning and 0.1 sec exposures. After running for a few minutes, TSX displayed the usual TX timeout error, but the camera and TSX were lock-up hard. I had to kill TSX and and cycle power on the camera to regain control. I have seen this problem in the past too, as mentioned above by Martin, but it is less frequent than the version where I can close the error message and restart exposures with no problem. Next, I turned on logging and ran the same test above for 30 minutes, but had no failure. At that time, I had to start an observation, so I turned logging back off and started up my observations as usual. The observations executed without error tonight throughout the 6 hour time-series of ~2 minutes exposures.
I'll turn logging back on and run more testing tomorrow for a longer period of time to try to cause (and capture) either the lock-up or non-lockup version of the TX timeout error.
Thanks for your work on this problem,
Thanks for the report, and repro instructions. I'll do a focus run using my development unit as well, to see if I can capture the error in house using the new drivers.
It took a while with the logs enabled, but here is a TX Timeout error using build 3. I used the same process: 0.1 sec focus exposures with an 8x8 subframe.
Separate names with a comma.