Update to rebuild timestamps when no RTC is available at boot #1

deldrid1 · 2016-10-04T13:42:03Z

This is accomplished by logging hardware.millis() to the spiflashlogger and then syncing hardware.millis() with time(), once our cloud connection is made and time() is restored.

A few open questions/comments:

This requires some minor modifications to the Bullwinkle Library
There is potentially a race condition that could lead to sending an invalid time to the agent... If server.isconnected() changes between line 58 of the impPager code and _bull._sendMessage line 169, we will end up sending _bootnumber + "-" + hardware.millis() instead of time().
- Can multiple calls to server.isconnected() return different results if Squirrel code never yields to impOS?
- If so, we will have to make this code a bit more ugly to handle the scenario...
Should we wrap / update bullwinkle to allow for multiple onSuccess/onFail handlers to be registered or just trust impPager to take care of business? This would allow us to return _bullwinkle.send() and allow additional actions outside of impPager to take place based on success/failure/ack.
Is the check on line 125 really needed? I'm not sure if there a race condition where we come online (_connectionManager.isConnected() == true) but we don't yet have a clock?
We should probably be doing some rounding where we calculate our delta times - right now we are relying on integer math, which will always round down (even if the calculated timestamp delta end in 999ms)
- ~~Will we get some odd behavior if we are sending 1 second data that happens to fall close to the 500ms hardware.millis() mark? There's only a ~10/1000 chance that it happens, but I'm not sure how the long term performance of imp.wakeup(1.0) behaves... (does it tend to be precise, tend to be slow, tend to be fast, etc. - it may vary from device to device based on crystal accuracy)~~
  - Haven't seen this happen and we did get some 506ms deltaT's so I am assuming it is a non-issue
~~Should we go ahead and [erase the spiflashlogger entry and add a new one with the RTC value corrected]~~(

ImpPager/ImpPager.class.nut

Lines 156 to 157 in fafe9a2

) once we calculate it?
- ~~This would help give us improve our odds at getting a clock onto our messages (especially in cases of a very flaky internet/power situation) at the potential expense of some SPIFlash wear~~
- ~~Thinking through this a bit more, this is probably required because our math won't likely work out if a datapoint without an RTC has to have its RTC calculated a second time...~~
- We will probably need to pass around the reference to the next() callback function from the spiflashlogger.read() onData callback and deal with all of the edge cases to ensure we are careful about when we schedule our next tries. Right now the onFinish is not being used since we are managing the scanning outside of the way SPIFlashLogger was really designed. If we are talking about erasing/writing data while we are reading, I assume that our current strategy will cause things to not work very well.
We are writing ID's to SPIFlash but then we update them if we have a Bullwinkle packages collision - is this the right thing to do?

…ne and has no RTC. This is accomplished by logging hardware.millis() to the spiflashlogger and then syncing hardware.millis() with time(), once our cloud connection is made and time() is restored

…he original data without RTC

…cond. Provides slightly more accurate timestamps.

Maximizes retry throughput, while ensuring that only 1 message is in flight at a time (to protect against out of memory crashes)

ppetrosh · 2016-10-07T01:08:51Z

ImpPager.class.nut

+
+                      if("boot" in dataPoint.metadata && dataPoint.metadata.boot == _bootNumber){
+                        local deltaTMillis = _lastTS[0] - dataPoint.ts
+                        local deltaTSeconds = math.floor(deltaTMillis+500).tointeger()/1000 //Round to nearest second, but use this int value for updating _lastTS to keep things consistent


math.floor(deltaTMillis+500) == deltaTMillis+500

May be you meant
math.floor((deltaTMillis+500)/1000).tointeger()
?

duh - yes... Fixed in 76b3988

ppetrosh · 2016-10-07T01:19:01Z

ImpPager.class.nut

+                        _log_debug("Calculated new ts as " + dataPoint.ts + " (deltaT = " + deltaTMillis + " ms)")
+
+                        //update _lastTS so that we can have 25 days between datapoints instead of 25 days total of timestamps that we can rebuild
+                        _lastTS[0] -= (deltaTSeconds*1000)


_lastTS[0] -= deltaTMillis�
?

I did it the way that I did so that I would be careful to keep _lastTS[0] and _lastTS[1] consistent and not have any kind of floating point precision issues (a .tointeger() here would probably make things a bit clearer). _lastTS[1] is only precise down to the nearest second, and I don't want to deal with rounding problems, precision issues with floats, etc. here. It does mean that you will have slightly less than the full range of hardware.millis() to work with, but what's half a second when you have a 25 day window?

There may be some edge cases to think through when you have many messages within the same second that this doesn't handle very gracefully? I have another class that pushes data on a 1 second interval "in bulk" to help throttle/prevent fast occurring events from overwhelming the system, so I haven't stress tested this too much.

I guessed that. Thanks for the explanation!

ppetrosh · 2016-10-07T01:24:32Z

ImpPager.class.nut

+                    _resendLoggedData(dataPoint);
+
+                    // Don't do any further scanning until we get an ACK for already sent message
+                    _next = next


To reduce the risk of a race condition we probably need to have this statement at the beginning of the function.

In the current implementation, I think its a non-issue (no async code is going to jump in the middle of single-threaded squirrel), but agree - its an unnecessary risk that we can mitigate by moving this line of code.

Fixed in 127c400

ppetrosh · 2016-10-07T01:28:35Z

ImpPager.class.nut

+                      if("boot" in dataPoint.metadata && dataPoint.metadata.boot == _bootNumber){
+                        local deltaTMillis = _lastTS[0] - dataPoint.ts
+                        local deltaTSeconds = math.floor(deltaTMillis+500).tointeger()/1000 //Round to nearest second, but use this int value for updating _lastTS to keep things consistent
+                        dataPoint.ts = _lastTS[1] - deltaTSeconds  //All integer math, so no need to worry about decimal points


And as I mentioned in the Bullwinkle PR, I'm not fully convinced yet that we should modify .ts from outside the Bullwinkle.

We'll discuss this afternoon...

deldrid1 · 2016-10-07T13:47:19Z

@ppetrosh - any thoughts on the questions I posed at the beginning of the Pull Request? Creating real scenarios to test some of these conditions is pretty challenging without having access to set breakpoints in squirrel/impOS...

…tions. Not an issue with the current implementation, but could come back to bite someone in the future.

Also, add some comments and an explicit .tointeger() cast to make our strategy for updating _lastTS a bit clearer

This message can be used by the agent to understand that it has received all of the impPager data from SPIFlash. (You can detect the start of data from SPIFlash by checking ```if("metadata" in message && "addr" in message.metadata)```.) In a case where you need to read data from SPIFlash from newest->oldest to rebuild timestamps, but you need to process messages from oldest->newest for whatever reason, this message will let you know that things are complete.

…a timestamp to it. This was flawed logic... Because the order that we read and write are opposite, the risk of overwriting precious data is greater than the risk of not associating a proper timestamp with the data. There may be external systems that can rebuild the proper timestamps (since hardware.millis() starts at 0 at power-on time, if you know when your outages are, you can rebuild the timestamps external to the imp cloud)

ppetrosh · 2016-10-08T00:52:07Z

@deldrid1, agree, source-level debugging seems to be gaining the priority. We'll discuss this requirement internally. Though I don't think this is something we can commit to in a reasonably near future.

deldrid1 added 5 commits October 4, 2016 09:12

Update to rebuild timestamps for conditions where the imp boots offli…

fafe9a2

…ne and has no RTC. This is accomplished by logging hardware.millis() to the spiflashlogger and then syncing hardware.millis() with time(), once our cloud connection is made and time() is restored

Write datapoint's with calculated time() back to spiflash and erase t…

1064b53

…he original data without RTC

Round timestamps calculated when RTC is unavailable to the nearest se…

f4e2f15

…cond. Provides slightly more accurate timestamps.

Prevent Bullwinkle package ID collisions.

26c5dc8

Reduce retry time to 0.0

a0d7e8b

Maximizes retry throughput, while ensuring that only 1 message is in flight at a time (to protect against out of memory crashes)

deldrid1 mentioned this pull request Oct 5, 2016

Better partition Bullwinkle.Package and bullwinkle messages sent "over the wire" electricimp/Bullwinkle#20

Open

ppetrosh reviewed Oct 7, 2016

View reviewed changes

deldrid1 added 6 commits October 7, 2016 09:57

Namespace RTC_INVALID_TIME

2710e7c

Move storage of _next callback function to avoid potential race condi…

127c400

…tions. Not an issue with the current implementation, but could come back to bite someone in the future.

Fix deltaTSeconds rounding bug

76b3988

Also, add some comments and an explicit .tointeger() cast to make our strategy for updating _lastTS a bit clearer

Actually abort scanning when there is a connection error.

7d88a71

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update to rebuild timestamps when no RTC is available at boot #1

Update to rebuild timestamps when no RTC is available at boot #1

deldrid1 commented Oct 4, 2016 •

edited

Loading

ppetrosh Oct 7, 2016

ppetrosh Oct 7, 2016

deldrid1 Oct 7, 2016 •

edited

Loading

ppetrosh Oct 7, 2016 •

edited

Loading

deldrid1 Oct 7, 2016

ppetrosh Oct 7, 2016

ppetrosh Oct 7, 2016

deldrid1 Oct 7, 2016

deldrid1 Oct 7, 2016

ppetrosh Oct 7, 2016

deldrid1 Oct 7, 2016

deldrid1 commented Oct 7, 2016

ppetrosh commented Oct 8, 2016

Update to rebuild timestamps when no RTC is available at boot #1

Are you sure you want to change the base?

Update to rebuild timestamps when no RTC is available at boot #1

Conversation

deldrid1 commented Oct 4, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

deldrid1 Oct 7, 2016 • edited Loading

Choose a reason for hiding this comment

ppetrosh Oct 7, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

deldrid1 commented Oct 7, 2016

ppetrosh commented Oct 8, 2016

deldrid1 commented Oct 4, 2016 •

edited

Loading

deldrid1 Oct 7, 2016 •

edited

Loading

ppetrosh Oct 7, 2016 •

edited

Loading