Hacking Restricted Webkit bug finder

dojafoja

life elevated
OP
Member
Joined
Jan 2, 2014
Messages
696
Trophies
1
XP
2,609
Country
New version here:http://www.mediafire.com/download/c1mvzc0fsoi55cf/wbf_v0.4.rar
this is only the regular python script, I will build a windows executable and update OP tomorrow after work.

This is probably the last update I will be doing to this.

To anyone still using this that is using the file hosting, this is an important update.

Changes:

The old file hosting method was only allowing dependencies within the same directory to be found, any that were outside would fail. Also I was not properly returning from the server thread until the main thread terminated causing a buildup of threads until you exit the program. This fixes that. I also completely changed the way it was hosting the files and re-worked it to serve in the root of LayoutTests and create a single index.html that is rewritten each time with a javascript redirect to the proper file. This has worked in all my tests finding dependent files.

The log parser was modified to capture urls better. By doing so I found 25 new restricted bugs. These are included in the new database provided. Also, scanning for restricted bugs now happens in a separate thread so the ui doesn't become unresponsive. I also added some console output while scanning. Every 50 attempts it will print the number of urls left to scan and how many restricted bugs have been found so far.

No longer need to manually strip a svn log, it automatically stops parsing to when it reaches 10/15/2012

I have attached a txt file below that contains a list of the 25 new bugs that were found. I haven't looked into any of them and only did a database comparison to find which ones were new. I forgot to include this in the rar file.
 

Attachments

  • new_bugs.txt
    200 bytes · Views: 300

Onion_Knight

Well-Known Member
Member
Joined
Feb 6, 2014
Messages
878
Trophies
0
Age
45
XP
997
Country
New version here:http://www.mediafire.com/download/c1mvzc0fsoi55cf/wbf_v0.4.rar
this is only the regular python script, I will build a windows executable and update OP tomorrow after work.

This is probably the last update I will be doing to this.

To anyone still using this that is using the file hosting, this is an important update.

Changes:

The old file hosting method was only allowing dependencies within the same directory to be found, any that were outside would fail. Also I was not properly returning from the server thread until the main thread terminated causing a buildup of threads until you exit the program. This fixes that. I also completely changed the way it was hosting the files and re-worked it to serve in the root of LayoutTests and create a single index.html that is rewritten each time with a javascript redirect to the proper file. This has worked in all my tests finding dependent files.

The log parser was modified to capture urls better. By doing so I found 25 new restricted bugs. These are included in the new database provided. Also, scanning for restricted bugs now happens in a separate thread so the ui doesn't become unresponsive. I also added some console output while scanning. Every 50 attempts it will print the number of urls left to scan and how many restricted bugs have been found so far.

No longer need to manually strip a svn log, it automatically stops parsing to when it reaches 10/15/2012

I have attached a txt file below that contains a list of the 25 new bugs that were found. I haven't looked into any of them and only did a database comparison to find which ones were new. I forgot to include this in the rar file.


Are you using the changelog to parse the data? I've downloaded Webkit with SVN and just want to make sure I'm using the write logfile
 

dojafoja

life elevated
OP
Member
Joined
Jan 2, 2014
Messages
696
Trophies
1
XP
2,609
Country
Are you using the changelog to parse the data? I've downloaded Webkit with SVN and just want to make sure I'm using the write logfile
Yes its the changelog. I use linux but i obtained it by installing subversion and in terminal navigate to you WebKit directory and running:
Code:
svn log > log.txt

Edit: its the svn commit log. Not. sure if thats different than an official changelog
 

Onion_Knight

Well-Known Member
Member
Joined
Feb 6, 2014
Messages
878
Trophies
0
Age
45
XP
997
Country
Yes its the changelog. I use linux but i obtained it by installing subversion and in terminal navigate to you WebKit directory and running:
Code:
svn log > log.txt

Edit: its the svn commit log. Not. sure if thats different than an official changelog


Thanks, I figured it out. I changed your script a bit in the log parsing. It now grabs multiple urls in the same revision, I also added some timers in your html parser to do some time approximation for downloads. Basically it runs a timer for every 50 downloads and than uses that to estimate the time remaining. I'm looking for ways to speed it up but it seems that were stuck with speed being based mostly on the download rate.
 

Attachments

  • new_test_parser.zip
    5.5 KB · Views: 88
  • Like
Reactions: dojafoja

dojafoja

life elevated
OP
Member
Joined
Jan 2, 2014
Messages
696
Trophies
1
XP
2,609
Country
Thanks, I figured it out. I changed your script a bit in the log parsing. It now grabs multiple urls in the same revision, I also added some timers in your html parser to do some time approximation for downloads. Basically it runs a timer for every 50 downloads and than uses that to estimate the time remaining. I'm looking for ways to speed it up but it seems that were stuck with speed being based mostly on the download rate.
So I looked it over and I like what you did, I honestly have never used regular expressions and don't really understand it but your code was easy to follow so thank you. I learned everything I know by studying peoples source and googling stuff. I took your version and integrated most of the changes of my newest version. As far as speeding up the scanning, I had an idea once but never implemented it. Basically write a threading daemon and divide all the urls to scan into multiple threads and run like 5-10 threads at once. What do you think?

EDIT: Also, since I don't really understand re could you have your parser stop parsing anything prior to 10/16/2012, similar to what I did in my newest v0.4 I posted, but using re?
 

Attachments

  • new_test_parser2.zip
    5.7 KB · Views: 83

dojafoja

life elevated
OP
Member
Joined
Jan 2, 2014
Messages
696
Trophies
1
XP
2,609
Country
Found a slight error in your code here:
Code:
for url in urllist:
          url = urllib2.urlopen(url)
          html = url.read()
          soup = BeautifulSoup(html)
          title_tag = soup.findAll("title")
          for i in title_tag:
              x=str(i)
                  if 'Access Denied' in x:
                      urls_found += 1
                      print "Restriced URL Found"
                      self.denied_urls.append(url)

You were appending the instance of urllib2.urlopen(url) to self.denied_urls instead of the url string itself

A quick fix would be to rename the instance of this in the for loop to something like url2, and then of course url2.read()
 
  • Like
Reactions: Damieh79

Onion_Knight

Well-Known Member
Member
Joined
Feb 6, 2014
Messages
878
Trophies
0
Age
45
XP
997
Country
Found a slight error in your code here:
Code:
for url in urllist:
          url = urllib2.urlopen(url)
          html = url.read()
          soup = BeautifulSoup(html)
          title_tag = soup.findAll("title")
          for i in title_tag:
              x=str(i)
                  if 'Access Denied' in x:
                      urls_found += 1
                      print "Restriced URL Found"
                      self.denied_urls.append(url)

You were appending the instance of urllib2.urlopen(url) to self.denied_urls instead of the url string itself

A quick fix would be to rename the instance of this in the for loop to something like url2, and then of course url2.read()

Thanks,
That explains the error when dropped out of the loop. It takes roughly 10 hours to run through the whole list. I was thinking about multi-threading and throwing 10 connections at a time. Thoughts?

EDIT: Saw the post above this just now and realized we're on the same page. Multithreading is probably the way to get the most efficiency out of it.
 

Onion_Knight

Well-Known Member
Member
Joined
Feb 6, 2014
Messages
878
Trophies
0
Age
45
XP
997
Country
So I looked it over and I like what you did, I honestly have never used regular expressions and don't really understand it but your code was easy to follow so thank you. I learned everything I know by studying peoples source and googling stuff. I took your version and integrated most of the changes of my newest version. As far as speeding up the scanning, I had an idea once but never implemented it. Basically write a threading daemon and divide all the urls to scan into multiple threads and run like 5-10 threads at once. What do you think?

EDIT: Also, since I don't really understand re could you have your parser stop parsing anything prior to 10/16/2012, similar to what I did in my newest v0.4 I posted, but using re?
So I looked it over and I like what you did, I honestly have never used regular expressions and don't really understand it but your code was easy to follow so thank you. I learned everything I know by studying peoples source and googling stuff. I took your version and integrated most of the changes of my newest version. As far as speeding up the scanning, I had an idea once but never implemented it. Basically write a threading daemon and divide all the urls to scan into multiple threads and run like 5-10 threads at once. What do you think?

EDIT: Also, since I don't really understand re could you have your parser stop parsing anything prior to 10/16/2012, similar to what I did in my newest v0.4 I posted, but using re?


Python Regular Expressions aren't really conventional anyways, the syntax itself works but the structure is different than your typical Perl like.

I'll add in some basic definitions about what they are looking for and definitely have it stop prior that date.
 

Onion_Knight

Well-Known Member
Member
Joined
Feb 6, 2014
Messages
878
Trophies
0
Age
45
XP
997
Country
So I looked it over and I like what you did, I honestly have never used regular expressions and don't really understand it but your code was easy to follow so thank you. I learned everything I know by studying peoples source and googling stuff. I took your version and integrated most of the changes of my newest version. As far as speeding up the scanning, I had an idea once but never implemented it. Basically write a threading daemon and divide all the urls to scan into multiple threads and run like 5-10 threads at once. What do you think?

EDIT: Also, since I don't really understand re could you have your parser stop parsing anything prior to 10/16/2012, similar to what I did in my newest v0.4 I posted, but using re?


Add re code to match on dates. If you wanted to add a configurable text box on the GUI you could add user specified end dates, just check to make sure they meet "YYYY-MM-DD" format. It also now supports multi-threading and spawns 10 threads on each pass. Anymore, and I started throwing SSL errors. It reduced the 10 hours down to 2 for 32000+ bugs which is a significant improvement.
 

Attachments

  • new_test_parser2.zip
    6.4 KB · Views: 87

dojafoja

life elevated
OP
Member
Joined
Jan 2, 2014
Messages
696
Trophies
1
XP
2,609
Country
Add re code to match on dates. If you wanted to add a configurable text box on the GUI you could add user specified end dates, just check to make sure they meet "YYYY-MM-DD" format. It also now supports multi-threading and spawns 10 threads on each pass. Anymore, and I started throwing SSL errors. It reduced the 10 hours down to 2 for 32000+ bugs which is a significant improvement.

Man that was fast! Thanks for the detailed explanation in the code on whats going on with re. After reading your comments I could follow it but it still seems a bit wild lol. You cranked that out and brought the scan time to 1/5 what it was, thats great!! I will definitely allow a user supplied end date on the gui end. I haven't tested anything yet but the code looks awesome. Thanks again so much, I don't have a ton of time right now because of work and an android development project I'm doing using kivy for python. Which is pretty cool by the way for quick to crank out android apps. There's even an apk builder called buildozer, pretty cool.
 

Onion_Knight

Well-Known Member
Member
Joined
Feb 6, 2014
Messages
878
Trophies
0
Age
45
XP
997
Country
Man that was fast! Thanks for the detailed explanation in the code on whats going on with re. After reading your comments I could follow it but it still seems a bit wild lol. You cranked that out and brought the scan time to 1/5 what it was, thats great!! I will definitely allow a user supplied end date on the gui end. I haven't tested anything yet but the code looks awesome. Thanks again so much, I don't have a ton of time right now because of work and an android development project I'm doing using kivy for python. Which is pretty cool by the way for quick to crank out android apps. There's even an apk builder called buildozer, pretty cool.


Thanks,

I'm still working through some of the code and cleaning my stuff up. I'll try and get a final product out tonight so you have as much time to tweak it as possible. I'll add more comments throughout to clarify what i'm doing. Basically, my goal is to get this down around an hour and a half for a full scan.
 
  • Like
Reactions: dojafoja

dojafoja

life elevated
OP
Member
Joined
Jan 2, 2014
Messages
696
Trophies
1
XP
2,609
Country
Thanks,

I'm still working through some of the code and cleaning my stuff up. I'll try and get a final product out tonight so you have as much time to tweak it as possible. I'll add more comments throughout to clarify what i'm doing. Basically, my goal is to get this down around an hour and a half for a full scan.
You are a bada**, do whatever you want, a contibution like that is huge!
 

dojafoja

life elevated
OP
Member
Joined
Jan 2, 2014
Messages
696
Trophies
1
XP
2,609
Country
I hate to keep doing this :P I found this little change here that causes a database to not get created if the database doesn't already exist. I thought I would make you aware if you haven't picked up on it already. Everything else seems to rock so far!

Code:
if os.path.isfile('commits.db'): # There cannot be an existing file named 'commits.db if you plan to parse a new log.'
            #message.showerror('error','A file named commits.db already exists, please rename or move your old database file.')
            #raise Exception
            os.remove('commits.db')
        #else:
            bugs = ''
            rvn = ''
            url = []
            stop_date = "2012-10-16"
 

Onion_Knight

Well-Known Member
Member
Joined
Feb 6, 2014
Messages
878
Trophies
0
Age
45
XP
997
Country
I hate to keep doing this :P I found this little change here that causes a database to not get created if the database doesn't already exist. I thought I would make you aware if you haven't picked up on it already. Everything else seems to rock so far!

Code:
if os.path.isfile('commits.db'): # There cannot be an existing file named 'commits.db if you plan to parse a new log.'
            #message.showerror('error','A file named commits.db already exists, please rename or move your old database file.')
            #raise Exception
            os.remove('commits.db')
        #else:
            bugs = ''
            rvn = ''
            url = []
            stop_date = "2012-10-16"

Yeah, I had it commented out when I was tweaking it. Right now, I am having trouble with:

Code:
db.commit()
db.close()
root.html_thread = False
message.showinfo(title="Complete", message="Scanning for restricted bugs is complete")

both root.html_thread and message.showinfo are hanging.
 
  • Like
Reactions: Damieh79

Onion_Knight

Well-Known Member
Member
Joined
Feb 6, 2014
Messages
878
Trophies
0
Age
45
XP
997
Country
Yeah, I had it commented out when I was tweaking it. Right now, I am having trouble with:

Code:
db.commit()
db.close()
root.html_thread = False
message.showinfo(title="Complete", message="Scanning for restricted bugs is complete")

both root.html_thread and message.showinfo are hanging.

dojafoja
I've got it throttling pretty high now but it hits those statements and hangs. If commented out and I go to the 2nd tab, I get error's. I can't figure out what those are doing that are causing a hang.

Attached is the current build. I've got test settings on it to limit the checks down to verify everything works. Take a look and adjust your settings to find your best setup. Maybe another pair of eyes on it will help.
 

Attachments

  • new_test_parser2.zip
    6.6 KB · Views: 79

dojafoja

life elevated
OP
Member
Joined
Jan 2, 2014
Messages
696
Trophies
1
XP
2,609
Country
dojafoja
I've got it throttling pretty high now but it hits those statements and hangs. If commented out and I go to the 2nd tab, I get error's. I can't figure out what those are doing that are causing a hang.

Attached is the current build. I've got test settings on it to limit the checks down to verify everything works. Take a look and adjust your settings to find your best setup. Maybe another pair of eyes on it will help.
Maybe in the morning I will have time to really go over it, I think my wife has had enough of me being on the laptop this week :P. Basically all that root.html_thread = False is doing is resetting a value that is checked when the user clicks the scan button. It was to prevent multiple scans from starting if the user clicked the scan button multiple times without letting the previous scan complete. About the tkinter messagebox, I had to completely remove that part in my v0.4 version because Tkinter is not thread safe. Once I started putting things in seperate threads it would always hang when a external thread tried to generate a tk messagebox. Only the thread in which Tk was instantiated can call these. I tried everything I could think of, I had the external threads generate a virtual event, then bind the tk instance to the virtual event and have the messagebox called from my Root class and even this would hang. I tried having the external thread put the messagebox call into a queue using pythons Queue module and then have the main thread periodically check the queue and call the messagebox but that would hang too. IDK? I did a dirty little hack in my file hosting thread to get a messagebox when the index.html was successfully created. In the main thread when the button was clicked I ran a While loop waiting for the external thread to change a particular value, at which point I would break the loop and call the messagebox. Its a dirty hack but it sort of works.
 

Onion_Knight

Well-Known Member
Member
Joined
Feb 6, 2014
Messages
878
Trophies
0
Age
45
XP
997
Country
Maybe in the morning I will have time to really go over it, I think my wife has had enough of me being on the laptop this week :P. Basically all that root.html_thread = False is doing is resetting a value that is checked when the user clicks the scan button. It was to prevent multiple scans from starting if the user clicked the scan button multiple times without letting the previous scan complete. About the tkinter messagebox, I had to completely remove that part in my v0.4 version because Tkinter is not thread safe. Once I started putting things in seperate threads it would always hang when a external thread tried to generate a tk messagebox. Only the thread in which Tk was instantiated can call these. I tried everything I could think of, I had the external threads generate a virtual event, then bind the tk instance to the virtual event and have the messagebox called from my Root class and even this would hang. I tried having the external thread put the messagebox call into a queue using pythons Queue module and then have the main thread periodically check the queue and call the messagebox but that would hang too. IDK? I did a dirty little hack in my file hosting thread to get a messagebox when the index.html was successfully created. In the main thread when the button was clicked I ran a While loop waiting for the external thread to change a particular value, at which point I would break the loop and call the messagebox. Its a dirty hack but it sort of works.

I solved the first one and commented out the messagebox. Now I'm on to the next thing. Log and HTML parsing is good, but my changes modified the table for the database. I just have to re-write the querying.

My wife just rolls her eyes, but doesn't say anything. She knows that I'll be dreaming code all night anyway.
 
  • Like
Reactions: dojafoja

dojafoja

life elevated
OP
Member
Joined
Jan 2, 2014
Messages
696
Trophies
1
XP
2,609
Country
Does root.html_thread=False really hang by itself without the messagebox on your machine? Also using print like you did generated the messagebox on my machine
 

Site & Scene News

Popular threads in this forum

General chit-chat
Help Users
  • AncientBoi @ AncientBoi:
    ooowwww a new way for me to beat NFS 510 :D @SylverReZ
    +1
  • SylverReZ @ SylverReZ:
    @AncientBoi, Yeah, believe you can do PSP games as well. But a Pi5 is much powerful in comparison.
    +2
  • Psionic Roshambo @ Psionic Roshambo:
    Not sure about other models of Pi4 but the Pi 4 B with 8GBs OCed to 2Ghz handles PSP really great except like 1 game I found and it is playable it just looks bad lol Motor Storm Arctic something or other.
  • Psionic Roshambo @ Psionic Roshambo:
    Other games I can have turned up to like 2X and all kinds of enhancements, Motorstorm hmmm nope 1X and no enhancements lol
  • Veho @ Veho:
    Waiting for Anbernic's rg[whatever]SP price announcement, gimme.
    +1
  • Psionic Roshambo @ Psionic Roshambo:
    I will admit that one does seem more interesting than the usual Ambernic ones, and I already liked those.
  • Veho @ Veho:
    I dread the price point.
    +1
  • Veho @ Veho:
    This looks like one of their premium models, so... $150 :glare:
    +1
  • Psionic Roshambo @ Psionic Roshambo:
    To me that seems reasonable.
  • Psionic Roshambo @ Psionic Roshambo:
    I mean since basically all the games are errmmm free lol
  • Veho @ Veho:
    I mean yeah sure but the specs are the same as a $50 model, it's just those pesky "quality of life" things driving up the price, like an actually working speaker, or buttons that don't melt, and stuff like that.
    +1
  • Psionic Roshambo @ Psionic Roshambo:
    I think all in my Pi 4 was well north of 200 bucks 150ish for the Pi 4 the case the fancy cooler, then like 70 for the 500GB MicroSD then like 70 for the Xbox controller. But honestly it's a nice set up I really enjoy and to me was worth every penny. (even bought more controllers for 2 or 4 player games.) hmmm have never played any 2 player games yet :(
  • Veho @ Veho:
    Yeah that's what I hate about the RPi, it's supposedly $30 or something but it takes an additional $200 of accessories to actually turn it into a working something.
  • Psionic Roshambo @ Psionic Roshambo:
    yes that's the expensive part lol
  • Veho @ Veho:
    I mean sure it's flexible and stuff but so is uremum but it's fiddly.
  • Psionic Roshambo @ Psionic Roshambo:
    Yeah a lot of it I consider a hobby, using Batocera I am constantly adjusting the collection adding and removing stuff, scraping the artwork. Haven't even started on some music for the theme... Also way down the road I am considering attempting to do a WiiFlow knock off lol
  • Veho @ Veho:
    I want everything served on a plate plz ktnx, "work" is too much work for me.
  • Veho @ Veho:
    Hmm, with that in mind, maybe a complete out-the-box solution with all the games collected, pacthed and optimized for me would be worth $150 :unsure:
  • Psionic Roshambo @ Psionic Roshambo:
    Yeah it's all choice and that's a good thing :)
  • Bunjolio @ Bunjolio:
    animal crossing new leaf 11pm music
  • Bunjolio @ Bunjolio:
    avatars-kKKZnC8XiW7HEUw0-KdJMsw-t1080x1080.jpg
    wokey d pronouns
  • SylverReZ @ SylverReZ:
    What its like to do online shopping in 1998: https://www.youtube.com/watch?v=vwag5XE8oJo
    SylverReZ @ SylverReZ: What its like to do online shopping in 1998: https://www.youtube.com/watch?v=vwag5XE8oJo