Managed to extract Hub live log for my usage via python script

Philippe_Portes · February 14, 2017, 12:19am

Hi there,

Like many I looked on the forum to find something able to extract the logs (debug/trace/info) from the Hub like the live-log. As I didn’t want to use external services to access my logs, I decided to take this opportunity to learn how to scrap a web-page.

After many trials and python modules solutions, I ended-up using phantomJS and selenium without using an external browser launch which helps to have it working everywhere, not only on a PC.

My code is working under windows10 64bits using Python 2.7 and the modules imported in the source below. I am clearly not an expert of all these things, but it does the job.
Depending on where you are located, the URLs might be different from graph-na02-useast1, just use the one you see when you login in usually.
You have to put your username and password in the strings below, and everything should be ok.
[Update 2/14/2017: added page wait for authentication and work on delta from previous parsed refresh to directly export logs to a log file]
[Update 2/15/2017: simplified code and put it as CSV per day.

import platform
from bs4 import BeautifulSoup
from selenium import webdriver
import datetime
import time
import re
import difflib

login_l = 'yourSTusername'
password_l='yourSTpassword'

#function to extract needed string
def xstr(s):
if (s==None) or len(s)==0:
    return ""
else:  
    return ''.join(s)

# PhantomJS files have different extensions
# under different operating systems
if platform.system() == 'Windows':
PHANTOMJS_PATH = './phantomjs.exe'
else:
PHANTOMJS_PATH = './phantomjs'

# Initialize difflib for later usage
d= difflib.Differ()

# here we'll use pseudo browser PhantomJS,
# but browser can be replaced with browser = webdriver.FireFox(),
# which is good for debugging.
browser = webdriver.PhantomJS(PHANTOMJS_PATH)
browser.get('https://graph-na02-useast1.api.smartthings.com/login/auth')
username = browser.find_element_by_name('j_username')
password = browser.find_element_by_name('j_password')
username.send_keys(login_l)
password.send_keys(password_l)

browser.find_element_by_name('_action_Log in').click();
#time.sleep(10) # to replace by a page.load detection to ensure the login was successfully done.
print "Authenticating :",

while(login_l not in browser.page_source):
print ".",
print ""
print "Authenticated as ", login_l
browser.get('https://graph-na02-useast1.api.smartthings.com/ide/logs')

# variable needed log and page parsing management
first_log_trace=''
log_trace=''
old_page=''

#CSV file name
csv_file_name=''
last_file_name=''

print "Parsing...",

while (True):            
time.sleep(10) # put a sleep to avoid socket get messed-up while in WAIT state

#Check if the page content changed (=new logs)
while browser.page_source==old_page:
    pass
   
# new page content, we'll have to write things in the log file
new_log=1

#Prepare for file name based on the date.
timestr = time.strftime("%Y%m%d")

# first logs in this file might include last minutes of the day before logs
logfile= open(timestr+".csv","a+")

# If new file name, add csv header.
if last_file_name!=timestr+'.csv':
    # write csv header
    logfile.write('"Log_Class";"Device_Name";"Local_Log_Time";"Log_Message"')
    last_file_name=timestr+".csv"
    
# Prepare to parse page
new_page=browser.page_source

soup = BeautifulSoup(new_page, "html.parser")

# keep track of the page content to avoid reloading same page content
old_page=new_page

# We'll put the device names and the corresponding href used by ST logs to put the name in clear in the logs we export
devicename_dic={}
# the devices already traced are added as filters option on the top of the window,
# checking all element of <a class=filter> help to know the correspondance between later href URL and ST device names 
div_tools = soup.findAll('div', {'class':'tools'}) 
# loop in the device names
for child in div_tools:
    device_href_name=child.findAll('li')
    for href in device_href_name:
        device_href=re.search('.*href="(.*)".*', str(href.a))
        device_name=re.search('.*href=.*>(.*)</a>', str(href.a))
        if device_name != None:
            devicename_dic[str(device_href.group(1))]=str(device_name.group(1))
# search for the div sections with style attributes containing the logs
div_class_body = soup.findAll('div', {'style':'display: block;'})

for child in div_class_body:
    log_class=xstr(re.findall('<div class="(.*?)".*', str(child)))
    # extract the url from the console log and match it to the device name we stored.
    device_href=xstr(re.findall('.*href="(.*)".*', str(child)))
    # extract the time. Will be in format (H)H:(M)M:(S)S AM/PM TZ
    log_time=xstr(re.findall('<span class="time">(.*?)</span>', str(child)))
    # extract the log message text.
    log_msg=xstr((re.findall('.*msg">(.*?)</span>', str(child), re.MULTILINE)))
    # build the string. Due to ST displays the newest first, each line will be written in the file in the new->old order for this group
    # example:
    # [c
    #  b
    #  a] ==> first set of logs 
    # [f
    #  e] ==> second set of logs
    # This is not really important as file can be reordered by the program using the csv file. However, smartthings doesn't add the day 
    log_trace = '"'+log_class+ '";'+'"'+ devicename_dic.get(device_href,"")+ '";'+'"'+log_time+'";'+'"'+log_msg+'"\n'
    
    # events being in reverse date order, stop writing to files events already written last time.
    if (log_trace==first_log_trace ):
        new_log=0
    #first time parsing? then write csv header in the file and set variables for next time in order to avoid writing past events
    if first_log_trace=='':
        first_log_trace=log_trace
        new_log=1
    # got new logs? print them in the file
    if new_log==1:
        print log_trace
        logfile.write(log_trace)
   
logfile.close

Example of logs as csv file I got:
“Log_Class”;“Device_Name”;“Local_Log_Time”;“Log_Message”
“log debug”;“Arrival Sensor guest”;“3:55:14 PM PST”;“Creating battery event for voltage=2.2V: Arrival Sensor guest battery is 70%”
“log debug”;“Arrival Sensor guest”;“3:55:14 PM PST”;“Creating presence event: Arrival Sensor guest presence is present”
“log info”;"";“3:55:06 PM PST”;“Waiting on events…”
“log info”;"";“3:55:06 PM PST”;“For past logs for individual things go to the My Devices section, find the device and click on the Events link on the device information page.”
“log info”;"";“3:55:06 PM PST”;“This console provides live logging of your SmartThings.”
“log debug”;“Arrival Sensor guest”;“3:55:34 PM PST”;“Creating battery event for voltage=2.2V: Arrival Sensor guest battery is 70%”
“log debug”;“Arrival Sensor guest”;“3:55:34 PM PST”;“Creating presence event: Arrival Sensor guest presence is present”
“log debug”;“Arrival Sensor guest”;“3:55:54 PM PST”;“Creating battery event for voltage=2.2V: Arrival Sensor guest battery is 70%”
“log debug”;“Arrival Sensor guest”;“3:55:54 PM PST”;“Creating presence event: Arrival Sensor guest presence is present”
“log debug”;“Arrival Sensor guest”;“3:55:34 PM PST”;“Creating battery event for voltage=2.2V: Arrival Sensor guest battery is 70%”
“log debug”;“Arrival Sensor guest”;“3:55:34 PM PST”;“Creating presence event: Arrival Sensor guest presence is present”
“log debug”;“Arrival Sensor guest”;“3:56:00 PM PST”;“Sensor checked in 5.707 seconds ago”

Hope it helps…

Jihao_Liu · August 15, 2018, 5:02pm

Hi,
I read your code. Recently, I also want to automatically login SmartThings online IDE. However, I found that SmartThings can detect my automated login in and I also need to fill the dynamic verification code. Thus, I failed.

  Do you know this?

Philippe_Portes · August 15, 2018, 11:50pm

Could you put some snapshots of the sequence you are facing?
I will check if my code still works after they migrated us to Samsung account.

Jihao_Liu · August 16, 2018, 12:06am

Here is my login code. I tried to login in online IDE using my smartthing account.

#!/usr/bin/python
# -- coding: utf-8 --

from selenium import webdriver
import time
import os
from selenium.webdriver.common.by import By
from selenium.webdriver import ActionChains
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

try:
import cPickle as pickle #python 2
except ImportError as e:
import pickle #python 3

#fp = webdriver.FirefoxProfile(r’C:\Users\Xiaolei Wang\AppData\Roaming\Mozilla\Firefox\Profiles\o27pwq6y.default’)
browser = webdriver.Firefox()
browser.maximize_window()
#browser.get(“http://www.baidu.com”)
browser.get(“https://account.smartthings.com/login/samsungaccount”)

#cookies=pickle.load(open(“cookies.pkl”,“rb”))
#for cookie in cookies:
#browser.add_cookie(cookie)
#browser.refresh()

browser.find_element_by_id(‘username’).send_keys("nudt_liujihao@gmail.com")

passwordStr = ‘***password’
password = WebDriverWait(browser, 10).until(
EC.presence_of_element_located((By.ID,“password”)))
password.send_keys(passwordStr)

WebDriverWait(browser, 10)
button=browser.find_element_by_id(“login-user-btn”)
button.click()

Here is the page I encountered using the above code.

If I login manually by myself, I can login in normally and this page never happen.
I don’t know why?

Thanks a lot.

Philippe_Portes · August 16, 2018, 2:50am

Here you go: this one is adapted to the Samsung logging page. Fields changed compared to previous implementation.

import platform
from bs4 import BeautifulSoup
from selenium import webdriver
import datetime
import time
import re
import difflib

login_l = 'yourSTusername'
password_l='yourSTpassword'

#function to extract needed string
def xstr(s):
    if (s==None) or len(s)==0:
        return ""
    else:  
        return ''.join(s)

# PhantomJS files have different extensions
# under different operating systems
if platform.system() == 'Windows':
    PHANTOMJS_PATH = './phantomjs.exe'
else:
    PHANTOMJS_PATH = './phantomjs'
# Initialize difflib for later usage
d= difflib.Differ()

# here we'll use pseudo browser PhantomJS,
# but browser can be replaced with browser = webdriver.FireFox(),
# which is good for debugging.
browser = webdriver.PhantomJS(PHANTOMJS_PATH)
browser.get('https://account.smartthings.com/login/samsungaccount')
username = browser.find_element_by_id('username')
password = browser.find_element_by_id('password')
username.send_keys(login_l)
password.send_keys(password_l)

browser.find_element_by_id('login-user-btn').click();
#time.sleep(10) # to replace by a page.load detection to ensure the login was successfully done.
print "Authenticating :",

while(login_l not in browser.page_source):
	print ".",
	print ""
print "Authenticated as ", login_l

Jihao_Liu · August 16, 2018, 2:55am

Yes, I tried and authenticated successfully.
Nice, You are too powerful !!!

Thanks a lot.

Jihao_Liu · August 16, 2018, 11:03pm

Hi,
Sorry about this. Yesterday, I only run your code and found that I can be authenticated successfully.
However, today, I replace browser with browser = webdriver.FireFox() instead of pseudo browser PhantomJS. I still failed to login in with the same page, as follows:

tgauchat · August 16, 2018, 11:47pm

I doubt that any script can bypass the website’s “robot detection” algorithms which trigger the CAPTCHA Security Code Request.

Unless that Security Code Request only happens after x number of failed logins…?

Jihao_Liu · August 16, 2018, 11:59pm

OK, I also thinks so.
I just want to automatically login in and test SmartApps using Selenium, as several previous papers did.

Maybe, I cannot do this anymore.

Philippe_Portes · August 17, 2018, 12:00am

that page appears upon wrong password/log in. I think the lib that Jihao is using it mixing up between IDs and names in the page so the login and password don’t get fed in the page before clicking. just my 2 cents

Jihao_Liu · August 17, 2018, 12:08am

Thus, you can login in normally using your code without this page. Is this right?

Philippe_Portes · August 17, 2018, 12:16am

yes it worked yesterday. As you see, my script prints Athenticated as: xxxx by getting the username on the page. If authentication fails, you end up on the page with empty fields with the captcha.

Why are you obliged to use firefox? phantomjs really does the job of getting the dynamic fields values which is why I use it.

Jihao_Liu · August 17, 2018, 12:25am

Yes, If I use phantomjs, it works normally. I found that you adopts a condition to determine the success of authentication, namely username appears on the page.
I download the page source and it indeed contains username. However, this page source is different from the obtained page source through manual login in.

The reason why I need to use Firefox is that I need to automatically install SmartApps and set preference or configure, then trigger device events.

Maybe, I need to think about my goal again.

Anyway, thanks a lot.

Philippe_Portes · August 17, 2018, 2:22am

I see what you mean. Indeed, since the change to Samsung account, the IDE web pages have changed and I see that Phantomjs doesn’t get further after the authentication.

If you are handy with programming, it would be worth checking how ST smartapps work and forward your events by http sending to whatever you want to communicate to or use IFTTT. Parsing the page for event management seems over-designed.
Additionally, ST provided a web interface several months ago. I never used it but this might also be a solution. I think they have that at https://smartthings.developer.samsung.com/develop/workspace/

Philippe_Portes · August 17, 2018, 2:25am

Well you are kind of half right: it can pass the authentication stage but indeed, I just found I received few emails from Samsung Account saying my account was used to login as below:
Device: BOT
etc…

ha

Jihao_Liu · August 17, 2018, 2:26am

You are so nice. Actually, I’m new to SmartThings and compared to Android, no so much documentation and tools can be used.

I will see the web interface you sent to me and try to find a solution.

Again, thank you so much

Topic		Replies	Views
Best way to collect logs from the hub? Writing SmartApps	167	36890	February 9, 2023
Hub Core Logging Writing Edge Drivers	4	391	June 23, 2023
Missing logs Devices & Integrations	6	409	October 14, 2020
Live Logging Scraper / Capturing General Discussion	1	1316	April 25, 2016
Live Logs In New Environment Support	2	563	September 5, 2022

Managed to extract Hub live log for my usage via python script

Related topics