CoRE and Piston Rules Engine, first design steps

The roadmap is as follows:

  1. finish the UI for conditions:
  • add device comparison (compare against a set value or against the value of a different device) - this is for one or two parameters (range uses two). So you can say if temperature in living is between temperature in bedroom and temperature in kitchen. The example may look meaningless, but just in case someone finds that useful.
  1. make the conditions functional:
  • implement a stash to store all “last” values for all devices/attributes involved, so that triggers can function. One particular need for this comes from the ability to “trigger” on any of a list of devices. I need to keep track of each device’s value to know when it “enters” a range, or “raises” above something. These are not just a mater of changed to this value which can be done without knowledge of past values.

  • write code for evaluation of conditions/groups. Evaluation is done in a tree algorithm, starting from the deepest level of the tree and going up.

  1. finish the UI for actions
  • implement action lists with the possibility of executing several commands per action (i.e. turn light on in 3 minutes and turn off in 5 minutes - all in one action). Thinking of this, no path laid out yet.
  1. make the actions functional
  • implement the queue management system
  • custom actions/commands

Still some way to go. Wanna help with coding? :slight_smile:

3 Likes

The reason I am posting these pics is for feedback. If you see anything is wrong or should be different, or would be easier in a different way, etc. or doesn’t make sense (i.e. due to CUI - Coding Under Influence of alcohol), etc…

So let me know if anything should be changed or added. The more features the merrier.

@JDRoberts (or anyone else) Should “or choose a device to compare” be available in Expert Mode only?

Thank you

Too many post to read but the key thing rule machine was missing was logic post rule success and allowing multiple options following this, rather than true/false, so for example if movement is detected turn lights on for 10 minutes, but I want the option then to define different lighting scenes based on mode, having different modes for evening, night and day…

I think compare is pretty intuitive for even basic users unless it’s going to complicate the rule in such a way that would require other expert features. But I think it’s probably fine either way. @bamarayne or one of the other people who have done a lot of peer counseling on rule building would probably know better than I would as to whether that confuses people. :sunglasses:

@Entityxenon Thank you, duly noted, will think of something. Individual actions may help, as you could have a rule that says:

(
    Motion detected
    AND
    Mode is "Home"
) { do some actions }
OR
(
    Motion detected
    AND
    Mode is "Night"
) { do some actions }
OR
(
    Motion detected
    AND
    Mode is "Away"
) { do some actions }

Each sub-condition can trigger actions on its own. Not the most practical way though - will see if I can come up with something better than that.

1 Like

I should think that the wording should be the same throughout the app.

Reality check :slight_smile: When implementing subscription for all events, ran into the following rules/limits/features I had to impose.

Simple mode:

  • If there are any triggers present in the IF block, the list of unique device/attribute pairs involved in the triggers is used for subscription. Regular conditions are ignored for the purpose of subscribing
  • If there are no triggers present in the IF block, the list of unique device/attribute pairs involved in the conditions is used

By unique, I mean that if you use “This door is open” twice in the logical tree, I only subscribe to it once. It may be that ST filters out the second subscription, but I don’t want to rely on them

Else-If mode:

  • same as Simple, BUT:
  • second IF does NOT create any subscriptions. This is because the second IF is to be evaluated only after the first one is evaluated. To keep things sane, the second IF block should NOT be able to trigger the evaluation of the first, therefore, I am forced to NOT allow triggers in the second IF block - after all, that IF is supposed to be evaluated when the first IF is false.

Latching mode:

  • complicated
  • each of the two IFs runs its own logic flow, however, a trigger from the first IF should not trigger the evaluation of the second IF block, unless the same device/attribute is used in that block too. Therefore, I had to subscribe to each block separately. But I don’t want to subscribe to the same device/attribute twice (say you use the same pair in the IF block and in the BUT IF block as well). For those that are common to both blocks, I run separate subscriptions that have a different handler that then broadcasts the event to both blocks. I do NOT want to subscribe to the same device/attr twice, no matter what.

Attention needs to be paid to the type of event (trigger vs condition). Since I subscribe to a device/attr for both lists, a case may happen where the IF block uses it as a trigger, whereas the BUT IF uses it as a condition. I need to check the following: before processing any event, for any of the two blocks (IF or BUT IF), perform the following checks:

  1. does the block use triggers? if no, go to 3
  2. does the block have a trigger (not condition) matching the device/attr in the event? if not, exit
  3. evaluate the block, using the event as the source of the evaluation (later used for logging)

These are all internals, but I thought I should share them with you so you can react in case my logic is not sane.

The “subscribe” logic so far:

def subscribeToConditionDevices(condition, triggersOnly, handler, subscriptions, onlySubscriptions, excludeSubscriptions) {
    if (subscriptions == null) {
        subscriptions = [:]
    }
    def result = 0
    if (condition) {
        if (condition.children != null) {
            //we're dealing with a group
            for (child in condition.children) {
                subscribeToConditionDevices(child, triggersOnly, handler, subscriptions, onlySubscriptions, excludeSubscriptions)
            }
        } else {
            if (condition.trg || !triggersOnly) {
                //get the details
                def devices = settings["condDevices${condition.id}"]
                def attribute = cleanUpAttribute(settings["condAttr${condition.id}"])
                if (devices) {
                    for (device in devices) {
                        def subscription = "${device.id}-${attribute}"
                        if ((excludeSubscriptions == null) || !(excludeSubscriptions[subscription])) {
                            //if we're provided with an exclusion list, we don't subscribe to those devices/attributes events
                            if ((onlySubscriptions == null) || onlySubscriptions[subscription]) {
                                //if we're provided with a restriction list, we use it
                                if (!subscriptions[subscription]) {
                                    subscriptions[subscription] = true //[deviceId: device.id, attribute: attribute]
                                    if (handler) {
                                        //we only subscribe to the device if we're provided a handler (not simulating)
                                        log.trace "Subscribing events from $device for attribute $attribute, handler is $handler"
                                        subscribe(device, attribute, deviceHandler)
                                    }
                                }
                            }
                        }
                    }
                } else {
                    return
                }
            }
        }
    }
    return subscriptions
}

def subscribeToAll(app) {
    //we have to maintain two separate logic threads for the latching mode
    //to do so, we first simulate 
    def triggerCount = getConditionTriggerCount(app.conditions)
       def latchingTriggerCount = 0
    
       if (settings.mode == "Latching") {
        //we really get the count
        latchingTriggerCount = getConditionTriggerCount(app.otherConditions)
        //simulate subscribing to both lists
        def subscriptions = subscribeToConditionDevices(app.conditions, triggerCount > 0, null, null, null, null)
        def latchingSubscriptions = subscribeToConditionDevices(app.otherConditions, latchingTriggerCount > 0, null, null, null, null)
        //we now have the two lists that we'd be subscribing to, let's figure out the common elements
        def commonSubscriptions = [:]
        for (subscription in subscriptions) {
            if (latchingSubscriptions.containsKey(subscription.key)) {
                //found a common subscription, save it
                commonSubscriptions[subscription.key] = true
            }
        }
        //perform subscriptions
        subscribeToConditionDevices(app.conditions, false, bothDeviceHandler, null, commonSubscriptions, null)
        subscribeToConditionDevices(app.conditions, triggerCount > 0, deviceHandler, null, null, commonSubscriptions)
        subscribeToConditionDevices(app.otherConditions, latchingTriggerCount > 0, latchingDeviceHandler, null, null, commonSubscriptions)       
    } else {
        //simple IF case, no worries here
        subscribeToConditionDevices(app.conditions, triggerCount > 0, deviceHandler, null, null, null)
    }
}

Something I just thought of is rather concerning.

This app, just like rule machine, allows for a massive amount programming to occur within the confines of one rule. This is going to lead to some pretty major problems.

When using rule machine I at first built large extremely complex rules. I tried very hard to push the app to it’s limit, and I found it. It the ST back-end processing limit of 20 seconds.

One that was realized I weren’t with smaller less complex rules but created a type of wheel mesh with them.

I had one central rule that was triggered by certain conditions and devices (motion sensor). That central rule, when triggered, sends out a Boolean change to the rule at the end of every spoke.

Each of those rules had conditions as well. Boolean true/false, mode, presence. If all other conditions were meet when the Boolean change occurred then that rule fired. All others did nothing.

This also allows me to have a rule fire with the Boolean change creating a rule truth. At the same time that Boolean change also reached rules that were evaluated as false, which can then fire off the actions for false.

So, one trigger (motion) creating actions in three or more rules and nothing times out, since each rule gets it’its own timer on the servers.

Thus, I created a large complex rule that runs individually on the server and never times out.

This also allows me to add to and manipulate the mesh very easily with minimal impact on the system whole. This system also allows for many levels of rules with each having a fair shake on the server.

I am seeing large very complex rules being built here and I’m concerned about the impending failures that are going to result.

Is there a way to limit within the confines of the code to prevent this?

3 Likes

Also, feature request.

Presence sensors are notorious for bouncing on and off the network.

There is a block of code built in to RM that was not public. It allows fit on a trigger to state that changes of that trigger will be ignored if there has been a state change within x minutes.

The x is the variable chosen by the user.

I guess it is all in the way the workflow goes. Run everything in one go or schedule things to happen in different threads. Split the workload, if you may. If I am not mistaken, it is the actions that take more time, especially if you use http posts, etc., things that may take time. I plan a lot for the scheduler. There are two sides to the scheduler: 1. time related events that trigger evaluations, and 2. commands to be executed at certain times. A mechanism will be in place to “catch up” with actions that were not executed. But I’ve got some way to go to get there. Your point is valid and stays. Let’s look it up when we have a working app. So far I can subscribe to events, I am not yet “evaluating” the conditions, but that’s coming very soon. A second strategy to minimizing run time is to not perform commands if their desired result is already the current state of said device. There are many ways to optimize things and I am an optimization freak. See above, I don’t want the same event to trigger two different handlers, waste of resources.

As for the rate limiting piece of code you’re showing there, that’s mine. I wrote it, so yes, there will be some sort of rate limiting for events.

hahaha, i thought that was the case and I couldn’t for the life of me remember who… but hey, I love that little piece of code!

Ok, I understand what you’re saying.

Please correct me if I’m wrong. But since the app will not run locally then the process will be like this…

door opens --> signal sent to hub --> hub sends signal to server --> server processes event and pairs it with the appropriate code --> servers sends rule actions to hub --> hub sends action commands to devices. Light comes on.

This round trip was the major problem always causing the time out of the evaluation. I’m seeing this app running into the exact same limitation imposed by the servers.

Or are you planning a built in timer that is going to schedule its own things. But, we still run into the above flow path. This app will still be on the server.

But, like you said… once we have an operating app then we can beta test the hell out of it and find the limitations.

Okay, here’s the documentation:

It says 20 continuous seconds per method. The typical execution pattern is:

eventHandler {
…code…
…code…
…method… <<< this can run up to 20s
…code…
…code…
}

eventHandler has 20 seconds to reach that method, the method has 20 seconds to complete, then eventHandler has another 20 seconds to finish. But, overall, the whole execution needs to be done in 40 seconds.

Here’s two wild ideas, but don’t tell anyone:

Typical RM cycle is:

eventHandler {
…evaluate…
…if true…
…action…
…action…
…action…
…action.

…action
}

My experience is that these actions happen synchronously with RM, i.e. I only get one device doing one thing at any time. Looks like locks are taking the longest to execute. Also, I have an OSRAM Lightify bulb plugged into a circuit that has a mechanical wall switch. When the switch is off, RM times out :slight_smile:

So here’s the ideas:

  1. main app (or child app) creates a virtual device. Child app times itself between each action that’s executed. When too close to 40s execution time, it sends an event to the virtual device to which it subscribes, saves the list of pending actions and terminates. When event is triggered, app wakes up and continues with the saved list of actions. Snap!

  2. the more complex solution is with an action manager. Have main app as well as all child apps form a pool of executors. Child app sends list of actions to main app, main app splits the list across all child apps and they execute it. We’re now splitting the load over several apps (the more rules you have, the more “workers” you have), each with its own 40s limit. Plus, you’re running commands asynchronously. Not sure if the hub can handle them asynchronously though, but it is worth trying.

What do you think?

2 Likes

This sounds like it is definitely with a try. When I read this it forms a picture in my head of the hub spoke method I already use to avoid this…

The difference if of course is my way I’d on the surface and I have to carefully program that logic. What you’re talking about is run automatically behind the scenes.

As for user friendly, this would be the way to go, if it works.

But what happens when there are lot of things going on at once?

I think you have a creative idea that illustrates the absurdity of the 20 “real time seconds” per SmartApp thread execution limit. It is arbitrary and worthless for real world scenarios. Execution complexity should be measured in real-resource utilization (CPU, memory, I/O) and has to be divided by the valid complexity of the work performed. There are no shortcuts to instrumentation.

Your solution introduces a ton of overhead which creates a net significantly higher impact on the SmartThings Cloud, completely negating the benefits that were intended from the 20s execution time limit.

@slagle and @jody.albritton really need to advocate for more meaningful instrumentation and meaningful justified “guard rails”, otherwise convoluted solutions will be developed that cost effort, platform resources, complexity risk and generally hurt everyone.

This is about as useful an idea as building our own external scheduler… It is feasible and even necessary if the platform has artificial or unresolved limits that aren’t addressed by the architects, engineers, and operations.

5 Likes

I agree about the overhead. I would be going around the limitations by adding even more work. But if that makes it work… Maybe ST reconsiders.

And I am pretty sure it’s 40s per app execution, 20s per method execution (have a method that executes the action and call it for each action to help with this limit).

1 Like

Yah… That’s right.

I don’t really get their decision with the real time limit on a method/app. What would a method do for 20s when the language does not support a sleep method? The only ways a user-written method can spend 20s are by having an infinite loop (which they are probably targeting with this limit) OR… and this is where they ruined a lot of things… waiting on an external service (http request). Depending on the implementation of the HTTP request, the latter should not be a CPU hog, therefore should not do as much damage as an infinite loop would do. Or even worse, an infinite loop that keeps appending characters to a string variable, causing memory pressure too. Or using atomicState, causing I/O as well. I think they need to concentrate on limiting the CPU time instead, put a max limit on the memory - raise a memory exception when memory is not enough to have the programmer rethink his ways and put a cost on atomicState (i.e. if you use atomicState then you have much less CPU time available). I had a problem once with an external service not responding in 20s, thus stopping my SmartApp (as the HTTP request happened in a method, the 20s/method kicked in first).

For those who don’t know what “real time” and “CPU time” mean, the real time is the difference between the end time and the start time, pretty much how much time it took for the app to execute as perceived by an external witness with a watch. The CPU time, however, refers to how much time the CPU spends executing the app. Modern systems allow multiple activities happening at the same time by sharing the CPU and many other resources. This is typically called “multitasking”. The CPU will switch from one task to another, executing one at a time (one task per virtual core). Therefore, a single task may use a lot less than 100% of that CPU. The average usage is calculated over a time interval, then multiplied with the interval length to get the CPU time. For example, a SmartApp that runs for an hour (what is it doing?) at an average CPU usage of 1% will have the time the CPU spent on that SmartApp at 36 seconds (1/100 of an hour).

3 Likes

Introducing the “was” conditions. These are just like the “is” condition, but allows an extra time comparison method (for at least, for less than) and a minute value (1-60 minutes):

List
Front Door

Attribute
contact

Comparison
was

Value
closed

Time limit
for at least

Time value
5 minutes

Condition spells out "Front Door was closed for at least 5 minutes"
The opposite is “Front Door was closed for less than 5 minutes”

I’m thinking that the “was” conditions should not look at the current state, but at the previous state and the event list for that device/attribute pair. You can then say

IF Front Door changes to open AND Front Door was closed for less than 5 minutes THEN { tell the kids to stop playing with the door already, or else! }

This would also allow @bamarayne’s scenario as well:

IF Motion Sensor changes to active AND Motion Sensor was inactive for at least 5 minutes THEN { sound the alarm, the man with the horse is here to talk to us }

And then, least but not last, attempting to introduce (pending complexity checks) the “stays” and “does not stay” triggers. This prompts a scheduled run so many minutes after the event actually happened.

IF Fridge Door stays open for more than 5 minutes THEN { call wife and ask her to get out of the fridge }

The “more than” is obviously optional, for 5 minutes would do just the same. I would only trigger the event once, but I can’t guarantee exactly 5 minutes, it may be 5 minutes and 25 seconds for example, depending on ST’s load.

I think this will be about all the conditions I want to deal with right now, extras can be added at a later time.

As for @bamarayne’s feature request, that code allows a trigger to look back before the current event and ensure no other events of the same nature happened in the last X minutes. This can be achieved so far with a trigger + a “was” condition, but the implementation in RM is simpler, does not require two conditions. The alternative I offered right now makes it more “English” friendly though - easier to read and understand what the condition is about (talking about the Condition Overview)

Any ideas, comments, suggestions?

Thank you

6 Likes

Actually, the “stays” and “does not stay” should both be “stays” and the same options should be possible “for less than” and “for at least”. The “for less than” will be executed on a new event changing the state of said device less than so many minutes. The “for at least” needs to be triggered by a time trigger. Doable, I think.