-
Notifications
You must be signed in to change notification settings - Fork 3
Better partition Bullwinkle.Package and bullwinkle messages sent "over the wire" #20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This allows for creating globally Unique IDs (assuming the imp hits a reboot every 2,147,483,647 messages, which is over 68 years assuming 1 message/second). By setting prependID to the boot number of the device (which can be stored in a SPIFlashLogger), you can have this property.
Eliminates the number of tries from being sent over the connection to reduce data consumption of the class. Also includes a bugfixes for storing the timestamp in the Bullwinkle.Package instead of the message object
…e server connection is unavailable
This reverts commit d07ea66.
@@ -147,7 +151,6 @@ class Bullwinkle { | |||
"name": command, | |||
"data": data, | |||
"ts": ts, | |||
"tries": 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good idea to get rid of "tries" here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a comment in the other thread that message.retry and message.tries are items that I don't expect anyone would be using,
. These are things I use frequently (in almost every project) to allow for [3] retries before failing with an error message. Be careful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should have better stated that I don't expect anyone would be using those fields on the _partner side (although maybe for analytics on the number of failures?).
My opinion is that this data is wasteful and doesn't need to be transmitted over the wire but there may be use cases I haven't thought of as I didn't write the lib originally...
I also feel that way about message.retry, however, I haven't gotten around to making that change based on the push back in the PR so far.
if (message.type == BULLWINKLE_MESSAGE_TYPE.SEND) { | ||
message.tries++; | ||
if (message.type == BULLWINKLE_MESSAGE_TYPE.SEND && message.id in _packages) { | ||
_packages[message.id]._tries++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor cosmetic: let's please try to keep the size of indents the same across the code, like 4 spaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry - copy/paste from the IDE to Atom does some weird things... Getting ElectricImp-Sublime working is a must! :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Thanks! I'm working on it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change from tries
to _tries
breaks my existing code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth making the breaking change to reduce overhead on the wire? (I would think from imp's perspective for "free" developer devices and your paying customer's perspective for large device deployments, that would be a win...)
@blindman2k - could the onFail callback function be updated to something like
.onFail(function(err, message, retry, numTries) {
// Try sending the message again in 60 seconds, assuming we've not made more than 3 tries
if(numTries <3){
if (!retry(60)) {
server.error("No more retry attempts are allowed");
}
}
})
I'm also curious as to why you look at message.tries directly instead of just passing the maxRetries
setting and let the class take care of it for you? I've not used the retry functionality too much myself (TCP typically takes care of it for me, or I just assume the message will fail and I need to deal with it some other way) so I may be missing something...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a/ The maxRetries parameter came after the tries field
b/ If I remember correctly the maxRetries doesn't inform the caller when it stops retrying. Not 100% on this.
Breaking developers code is not something I would recommend at this stage.
@@ -385,7 +395,7 @@ class Bullwinkle { | |||
|
|||
// Add the retry information | |||
message.retry <- { | |||
"ts": time() + timeout, | |||
"ts": ( Bullwinkle._isAgent() ? time() : hardware.micros()/1000000 ) + timeout, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general this looks a little bit confusing to me. We have Package.ts and Message.ts. It's almost impossible to remember which of them is used for what. If one needs to have a custom timestamp for the data, from my POV it should go to the data payload and be handled on the app level as well as all the custom information.
@betzrhodes, @blindman2k what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I disagree? We have a Package._ts (used for Bullwinkle internal logic) and we have a message.ts that corresponds to the timestamp that is associated with the message.data.
I don't disagree that the ts could go in the data payload, however, message.ts is a key that much of my application code relies on (and I would assume others rely on it as well) so I don't think that removing its use in the lib should be taken lightly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point is that we should probably try to keep all the "custom" and application specific stuff in the message payload.
And regarding this particular change I'd recommend to keep the _ts for internal/generic purposes and build all the application specific logics on the custom message payload fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not opposed to moving the retry data structure over to the Package - just work that I haven't done yet and wasn't necessary to accomplish what I am doing with impPager.
message.retry and message.tries are items that I don't expect anyone would be using, but message.ts is something that I would think lots of folks are using.
Either way it is probably a major bump to the lib since there could be user breaking features (if people are using those message key/values) so if you want to remove the ts from the message and make it be a part of the payload that is fine with me. I'll just wrap .send() and ensure that a ts is always added to the data :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No-no, I don't want to remove .ts from the message. Neither I want to change it's semantics (what essentially the pull request does). My proposal is that if someone needs a custom timestamp, then he/she should be using the message payload to add it. Does it make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused by this change ... time() is seconds from epoch (or from boot if the device hasn't been online since power-up) and hardware.micros() microseconds from boot but wraps around every 36 minutes. How is this change not going to make a massive mess?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like to me message.ts should be the eventTime in epoch seconds. This provides the timestamp for the data and should be what is transmitted to the _partner for it to do whatever useful thing might be done with a timestamp that corresponds to the data.
For this pull request, package._ts is used only for timeouts and calculating latencies. time() by itself doesn't give very useful info as it is only precise down to the second, hence the need to go to hardware.millis() or hardware.micros(). time() also doesn't tick when there is no RTC present. Since 36 minutes seems like significant overkill for a timeout or for a network latency, I went with hardware.micros().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that time() is a pain, especially for code that executes before going online. You are probably right that 36 minutes is enough time for most but not for all. At the very least we would need to warn (or prevent) the developer not to set a messageTimeout greater than 2160.
Also, watchdog is no longer valid as it uses time() instead of micros().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could use hardware.millis()
to would provide a much larger window (25 days), however, I chose hardware.micros()
to provide some amount of consistency between the agent and device...
In my very ugly code, watchdog is still valid. If time() is invalid and not ticking, the message fails before it ever gets to the watchdog. If time() is ticking, the watchdog still works correctly by doing all of the nasty split() calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to use millis() and you could always pad it with 0's. But it's not something you can compare to the agent so it hardly makes a difference.
Argh, this is a mess.
When used with some kind of nv storage, this allows for preventing ID collisions on the Agent or with something like impPager.
"maxRetries": ("maxRetries" in settings) ? settings["maxRetries"].tostring().tointeger() : 0, | ||
"autoRetry" : ("autoRetry" in settings) ? settings["autoRetry"] : false, | ||
"lowMemoryThreshold": ("lowMemoryThreshold" in settings) ? settings["lowMemoryThreshold"].tointeger() : 15000, | ||
"firstMessageID": ("firstMessageID" in settings) ? settings["firstMessageID"].tointeger() : 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please elaborate, what is this for? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
firstMessageID is designed to allow semi globally unique message IDs and limit the chance of collisions (in case you are keeping up with the IDs on the Agent, etc. to know if you have missed a message).
Bullwinkle has a pool of 2,147,483,647 IDs that it can use. Based on the RAM/SPIFlash limitations of Agent/Device, you are never going to have that many valid IDs available to compare against (although over a course of weeks/months you may cycle through them). This just allows you to have bullwinkle's ID generation start wherever you want it, instead of at 0. (For impPager testing, seeing message ID 0 after multiple reboots in a row wasn't terribly helpful for debugging)
See line 67 - this ought to be refactored to just be used directly there (there is no reason to keep it in this settings object as it is only used in the constructor), however, it does make the code a bit clearer for someone who is reviewing the source code instead of the readme.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, got it. Thanks for the explanation. Let's please describe all these new configuration settings in the doc, when we all agree on the changes.
_packages[message.id]._tries++; | ||
} | ||
|
||
if(imp.getmemoryfree() > _settings.lowMemoryThreshold && (_isAgent() || server.isconnected())){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we discussed this awhile ago and agreed to keep track of available memory on the app side. But I might be wrong. Not sure if this is generic enough to be a part of the library but don't want to push back hard here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was left rather open ended in Issue #18. This is essentially a 2 line implementation that adds very little library bloat, so I went ahead and snuck it in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we end up adding this feature into the library, my preference would be to at least keep the default behavior unchanged, i.e. have this flag equals to 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgive my bias towards wanting the feature :)
That seems reasonable to me to preserve existing functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sneaky, very sneaky.
I suggest you set the default value at 0 instead of 15000 to ensure that functionality is the same.
delete __bull._packages[message.id]; | ||
} | ||
local ts = "retry" in message ? message.retry.ts : split(package._ts, ".")[0].tointeger(); //Use either the retry ts or the package ts time(), but NOT the message ts so that it can be set for whenever the data was generated, instead of when Bullwinkle attempted to send it | ||
if (t >= (ts + _settings.messageTimeout) || t == 946684800) { //RTC is invalid, which implies we have no connection and should retry immediately. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
946684800 should probably be a constant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const RTC_INVALID_TIME = 946684800; //Saturday 1st January 2000 12:00:00 AM UTC - this is what time() returns if the RTC signal from the imp cloud has not been received this boot.
seem reasonable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Namespace ...
const BULLWINKLE_ RTC_INVALID_TIME = 946684800; //Saturday 1st January 2000 12:00:00 AM UTC - this is what time() returns if the RTC signal from the imp cloud has not been received this boot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of hardware.micros()
is invalid. The changes to tries
is a breaking change that is unnecessary.
@@ -21,7 +21,7 @@ const BULLWINKLE_WATCHDOG_TIMER = 0.5; | |||
|
|||
|
|||
class Bullwinkle { | |||
static version = [2,3,1]; | |||
static version = [2,3,2]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be:
static version = [2,4,0];
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on where the pull request ends up, it might ought to be [3,0,0]
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we may reserve v3.0.0 for a complete rewrite :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just (more than gently) trying to nudge you in that direction ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not the one you need to convince 😸
@@ -95,10 +99,11 @@ class Bullwinkle { | |||
// Parameters: | |||
// name The message name | |||
// data Optional data | |||
// ts Optional timestamp for the data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is unclear what the developer is expected to do with this new field nor what effect setting it will have
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose having proper docs would help with that :)
Its just a timestamp associated with the data. If not passed in, Bullwinkle will use the time that .send() is called (which is the v2.3.1 behavior) but this change allows for any timestamp that a developer wants to use.
Imagine a battery powered device that records data 1/minute and writes it to nv/spiflash. Then 1/hour it connects to WiFi and send all of the data. Probably in this case you would just send one big data structure of value/ts but you could send 60 bullwinkle messages with the correct timestamp as part of the message.
The actual reason I went through the trouble of making these changes is for impPager. With that class, we write data to spiflash when bullwinkle.onFail
happens. I then read 1 datapoint at a time and send it through bullwinkle (to avoid crashing out of memory if I send all the data at once). Here, I would like the ability to send the data with the original timestamp the data was created, not the timestamp that I call bullwinkle.send()
for the nth time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not put the timestamp in your data instead of in Bullwinkle's?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fully agree with Aron. This is what I'd suggest too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is fine (although not what I have implemented currently). If this is the path forward, I still don't think .ts needs to be sent across the wire or stored twice in RAM... Although that may be a v3 feature
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might not but others might 👯♂️
if(imp.getmemoryfree() > _settings.lowMemoryThreshold && (_isAgent() || server.isconnected())){ | ||
_partner.send(BULLWINKLE, message); // Send the message | ||
} else if(message.id in _packages){ //run the failure flow (if the package exists) | ||
local reason = imp.getmemoryfree() <= _settings.lowMemoryThreshold ? BULLWINKLE_ERR_LOW_MEMORY : BULLWINKLE_ERR_NO_CONNECTION |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This technique is a bit lazy. It's not a big issue now but it may become one in the future.
@@ -561,9 +595,7 @@ class Bullwinkle.Package { | |||
local d = date(); | |||
return format("%d.%06d", d.time, d.usec); | |||
} else { | |||
local d = math.abs(hardware.micros()); | |||
return format("%d.%06d", d/1000000, d%1000000); | |||
return format("%d.%06d", time(), hardware.micros()); //this can be a bit of an ugly _ts but it allows us to calculate latencies up to 36 minutes long... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirmed, it is seriously ugly. You have littered the code with split() calls by doing this. I suggest having _ts
as an array so you don't have to keep splitting it or make a function to split it like you have a function to create it.
The problem is it is NOT a valid use of hardware.micros(). On the agent it is valid, on the device it is not. hardware.micros() is not a sub-value of time().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True - it is not a subvalue of time and array makes more since. However, I was trying to keep consistency between the agent and device code implementations as much as possible (package._ts is always a string as opposed to a string on the agent and an array on the device).
Timeouts work off of the integer time() in seconds, and getLatency
knows what is going on, but the implementation is confusing and ugly.
Since ._ts is a "private" _property, people shouldn't be depending on it and we can likely move to an array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see, getLatency
only uses the seconds on the device. I missed that. Now, it's more accurate but even more ugly!
This set of commits provides better partitioning between the role of the Bullwinkle.Package (stored in the sending party's RAM) and the message that is actually sent between agent/device.
The main driver behind this is to allow custom timestamps to be sent with each message. Before, Bullwinkle relied on the message.ts for its internal timeout logic. Now, message.ts can be set to whenever the message was originally generated (say before it was written to spiflash) and Bullwinkle.Package._ts is used for the watchdog, etc.
I've not done a substantial amount of testing (are there any impTests for this lib?) but the update appears to generally be working for me... The README has also yet to be updated with the optional ts value in .send()