-
Notifications
You must be signed in to change notification settings - Fork 0
Update to rebuild timestamps when no RTC is available at boot #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…ne and has no RTC. This is accomplished by logging hardware.millis() to the spiflashlogger and then syncing hardware.millis() with time(), once our cloud connection is made and time() is restored
…he original data without RTC
…cond. Provides slightly more accurate timestamps.
Maximizes retry throughput, while ensuring that only 1 message is in flight at a time (to protect against out of memory crashes)
|
||
if("boot" in dataPoint.metadata && dataPoint.metadata.boot == _bootNumber){ | ||
local deltaTMillis = _lastTS[0] - dataPoint.ts | ||
local deltaTSeconds = math.floor(deltaTMillis+500).tointeger()/1000 //Round to nearest second, but use this int value for updating _lastTS to keep things consistent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
math.floor(deltaTMillis+500) == deltaTMillis+500
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be you meant
math.floor((deltaTMillis+500)/1000).tointeger()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
duh - yes... Fixed in 76b3988
_log_debug("Calculated new ts as " + dataPoint.ts + " (deltaT = " + deltaTMillis + " ms)") | ||
|
||
//update _lastTS so that we can have 25 days between datapoints instead of 25 days total of timestamps that we can rebuild | ||
_lastTS[0] -= (deltaTSeconds*1000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_lastTS[0] -= deltaTMillis�
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did it the way that I did so that I would be careful to keep _lastTS[0] and _lastTS[1] consistent and not have any kind of floating point precision issues (a .tointeger() here would probably make things a bit clearer). _lastTS[1] is only precise down to the nearest second, and I don't want to deal with rounding problems, precision issues with floats, etc. here. It does mean that you will have slightly less than the full range of hardware.millis() to work with, but what's half a second when you have a 25 day window?
There may be some edge cases to think through when you have many messages within the same second that this doesn't handle very gracefully? I have another class that pushes data on a 1 second interval "in bulk" to help throttle/prevent fast occurring events from overwhelming the system, so I haven't stress tested this too much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guessed that. Thanks for the explanation!
_resendLoggedData(dataPoint); | ||
|
||
// Don't do any further scanning until we get an ACK for already sent message | ||
_next = next |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To reduce the risk of a race condition we probably need to have this statement at the beginning of the function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the current implementation, I think its a non-issue (no async code is going to jump in the middle of single-threaded squirrel), but agree - its an unnecessary risk that we can mitigate by moving this line of code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 127c400
if("boot" in dataPoint.metadata && dataPoint.metadata.boot == _bootNumber){ | ||
local deltaTMillis = _lastTS[0] - dataPoint.ts | ||
local deltaTSeconds = math.floor(deltaTMillis+500).tointeger()/1000 //Round to nearest second, but use this int value for updating _lastTS to keep things consistent | ||
dataPoint.ts = _lastTS[1] - deltaTSeconds //All integer math, so no need to worry about decimal points |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And as I mentioned in the Bullwinkle PR, I'm not fully convinced yet that we should modify .ts from outside the Bullwinkle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll discuss this afternoon...
@ppetrosh - any thoughts on the questions I posed at the beginning of the Pull Request? Creating real scenarios to test some of these conditions is pretty challenging without having access to set breakpoints in squirrel/impOS... |
…tions. Not an issue with the current implementation, but could come back to bite someone in the future.
Also, add some comments and an explicit .tointeger() cast to make our strategy for updating _lastTS a bit clearer
This message can be used by the agent to understand that it has received all of the impPager data from SPIFlash. (You can detect the start of data from SPIFlash by checking ```if("metadata" in message && "addr" in message.metadata)```.) In a case where you need to read data from SPIFlash from newest->oldest to rebuild timestamps, but you need to process messages from oldest->newest for whatever reason, this message will let you know that things are complete.
…a timestamp to it. This was flawed logic... Because the order that we read and write are opposite, the risk of overwriting precious data is greater than the risk of not associating a proper timestamp with the data. There may be external systems that can rebuild the proper timestamps (since hardware.millis() starts at 0 at power-on time, if you know when your outages are, you can rebuild the timestamps external to the imp cloud)
@deldrid1, agree, source-level debugging seems to be gaining the priority. We'll discuss this requirement internally. Though I don't think this is something we can commit to in a reasonably near future. |
This is accomplished by logging hardware.millis() to the spiflashlogger and then syncing hardware.millis() with time(), once our cloud connection is made and time() is restored.
A few open questions/comments:
server.isconnected()
changes between line 58 of the impPager code and _bull._sendMessage line 169, we will end up sending_bootnumber + "-" + hardware.millis()
instead oftime()
.server.isconnected()
return different results if Squirrel code never yields to impOS?_bullwinkle.send()
and allow additional actions outside of impPager to take place based on success/failure/ack.We should probably be doing some rounding where we calculate our delta times - right now we are relying on integer math, which will always round down (even if the calculated timestamp delta end in 999ms)Should we go ahead and [erase the spiflashlogger entry and add a new one with the RTC value corrected](ImpPager/ImpPager.class.nut
Lines 156 to 157 in fafe9a2
This would help give us improve our odds at getting a clock onto our messages (especially in cases of a very flaky internet/power situation) at the potential expense of some SPIFlash wearThinking through this a bit more, this is probably required because our math won't likely work out if a datapoint without an RTC has to have its RTC calculated a second time...We will probably need to pass around the reference to thenext()
callback function from thespiflashlogger.read()
onData
callback and deal with all of the edge cases to ensure we are careful about when we schedule our next tries. Right now theonFinish
is not being used since we are managing the scanning outside of the way SPIFlashLogger was really designed. If we are talking about erasing/writing data while we are reading, I assume that our current strategy will cause things to not work very well.