Hi all! About 9 months ago I decided to start an experiment with my ST setup to see if I could do some data mining and learn anything interesting about our household habits. After experimenting with a few different platforms for data logging, I decided the best bet was to simply use IFTTT to store events into my Google Drive. From there, I was able to pull out the spreadsheet entries and import them into MATLAB, a scientific data analysis program. Once in MATLAB, it was literally one line of code to convert the spreadsheet data into a data matrix with all entries in an easy-to-analyze format.
For my first test case, I decided to look at the open/close events coming from our front door. I figured it might give some insight on how regular my schedule is going to/from work, or maybe shed some other interesting insights on daily activity around here. Let me throw up the plots, then I’ll explain below what they are and what I’ve learned (spoiler alert: don’t expect to be surprised)
The first plot shows all 5,266 events logged over the course of 9 months (corresponding to about 20 open/close events per day). I set the y-axis to be the day of the month that the event occurred, so it visually breaks up the individual months in a clean fashion. The first bit all the way at the beginning of the plot is back in August, and you’ll notice in September (the next line) there’s a couple week gap. At that point, I was still fiddling with my data logging options and had the IFTTT channel deactivated. Once I came back to IFTTT as my main tool for this project, the data goes unbroken all the way up until early May (the last little line all the way on the right). If you look closely at each of the months (each diagonal line is a month), you’ll notice that they’re a little jagged. That’s because on each day the door open & closed some pseudo-random number of times. The days we opened the door more frequently, there’s a longer horizontal dash. The days we opened the door less, it’s a bit shorter. One thing you can see is that in November and December the trend becomes less steep toward the end of the month, meaning the door was open/closed more times on a given day. Since this lines up with the holiday season, I suspect these trends come from the fact that we had company over, were home more instead of at work, and as a consequence, more active in our comings-and-goings. In Jan, Feb, and March, the data is much more linear with an essentially constant slope and is generally steeper than the holiday season events. Since it gets very cold here in Boston, I take this as empirical confirmation that we went out the bare minimum during those months!
The next two plots are histograms of the time that the door was opened & closed. The lower left plot is hours and the lower right is minutes. The histograms have been normalized and multiplied by 100 so we can read them as probabilities in percentage points. So for example, looking over the entire 9 month period, there’s about a 7% chance the door was opened at 9:00, a 7.5% chance it was opened at 10:00, a 5.5% chance it was opened at 11:00, etc etc. Here, I’m using a 24 hour scale to avoid any AM/PM nonsense. Two things jump out at me from this data. First, we are definitely not coming or going between 1 and 6 am. Second, the dog doesn’t get walked as regularly as I thought! Looking at the data, the uptick around 8 and 9 is typically when I leave for work. The peak at 10 is likely my wife walking the dog and going about her business. The second burst of activity between 18:00 and 21:00 (6:00 pm and 9:00 pm) is when we’re both home from work, going for walks with the dog, taking out the garbage, etc etc. My expectation was that these morning and evening peaks would be more pronounced with the afternoon’s having less activity, but I guess not!
For the third plot in the lower right, I decided to do a “control” and instead of looking at what hour the door was opened, what minute it was opened. For this, opening the door at 6:06 and 10:06 are equivalent since the specific value of the minutes are identical. My first thought was that I might see peaks in the 45 to 50s, thinking that maybe we leave 10 to 15 minutes early when we have to be somewhere at X o’clock sharp. Then I thought a bit harder and realized that our life just isn’t that regular and instead suspected the plot should be more flat. Looking at the data, we see that indeed, it’s basically flat. Sure there’s blips and bumps, but in general, everything hovers between 1.5 and 2% (note that 100% / 60 = 1.66%, which is exactly in the right range!). There’s a chance I could convince myself that activity steadily decreases for the first 35 minutes of the hour then increases as we get closer to the end of the hour, but the data’s pretty noisy and I’m a bit skeptical. So, what’s the take away on this? Our comings-and-goings are almost uniformly distributed throughout the hour!
Well, I was writing this up and then I realized that my analysis of the hours at which the door is opened is flawed – weekdays and weekends are intrinsically different for our lifestlye! Duh! So, I went back and separated out different datasets and here’s the breakdown:
Now things make more sense! On weekdays (blue curve) we have the morning activity around 8 to 10 when we’re heading out for the day, and again another large peak for the evening activity between 6 and 9. On weekends, however, there’s a much steadier pace of in-and-out throughout the day (orange curve). In the original plot of hourly activity, this lack of distinction mushed all the data together and suppressed the distinction. I think my big surprise is that we’re not more active later on the weekend evenings, but given that we’re parents of a young kid, I guess those days are behind us… And there’s evidence to prove it!
So what’s next? I started logging more devices in our ST setup. While this little project was fun and cute, I think there’s more serious analyses and applications for this type of data. For example, I think the next project will look for correlations between pairs of motion sensors, or motion sensors and open/close sensors. The idea is to ask questions like “if X door is opened at such-and-such time of day, can I predict if I’ll be going in or out of the room?” “How active are the cats at night?” and “how does household activity depend on who’s at home?”… Needless to say, I think there’s a lot of fun to be had! Any suggestions, questions, or comments are more then welcomed – this project is a work-in-progress!