Downloading the Internet

Zetta_x

The Insane Statistician
OP
Member
Joined
Mar 4, 2010
Messages
1,844
Trophies
0
Age
34
XP
574
Country
United States
Lets say theoretically we had a big enough harddrive to store everything on the internet.

Lets also say we had a fast enough internet connection that this could be done in a day. Would it be possible to download the entire internet and browse every webpage offline?
 

BloodyFlame

Well-Known Member
Member
Joined
Aug 6, 2010
Messages
361
Trophies
0
Location
California
XP
205
Country
United States
Zetta_x said:
Lets say theoretically we had a big enough harddrive to store everything on the internet.

Lets also say we had a fast enough internet connection that this could be done in a day. Would it be possible to download the entire internet and browse every webpage offline?
Terrabytes of pr0nz.
smileipb2.png
 

mrSmiles

Dundunduuuun
Member
Joined
Oct 27, 2002
Messages
1,322
Trophies
0
Age
35
XP
397
Country
Canada
Zetta_x said:
Lets say theoretically we had a big enough harddrive to store everything on the internet.

Lets also say we had a fast enough internet connection that this could be done in a day. Would it be possible to download the entire internet and browse every webpage offline?

if we theoretically had a big enough harddrive and a fast enough internet connection then theoretically we can download and browse everything on the net.

question answers itself.
 

Zetta_x

The Insane Statistician
OP
Member
Joined
Mar 4, 2010
Messages
1,844
Trophies
0
Age
34
XP
574
Country
United States
Ok, how about this. Since you guys clearly agree it is possible to do this.

How hard and how efficient would it be to create a web browser or firefox extension that checks certain site directories every x minutes and download any updates.

What I want to create is some extension that lets a user define a certain subdirectory of a website (or the website itself), uses an active internet connection, and continually downloads any new material for 3 modes.

1 - Disabled: A normal web browser today.
2 - Hybrid Mode: When your internet connection drops, loads last stored data webpage.
3 - Offline Mode: Loads last data webpage regardless of internet connection.
 

monkat

I'd like to see you TRY to ban me. (Should I try?.
Banned
Joined
May 21, 2009
Messages
2,242
Trophies
0
Age
32
Location
Virginia
Website
www.monkat.net
XP
105
Country
United States
Without access to every server's protected information?

Near infinitely difficult.

If everything were just like your hard drive, then it wouldn't be too hard, assuming the computer was large / powerful enough to handle billions of internet search queries per amount of time.

I don't see why you're asking, though, it's not going to happen. Ever.
 

Zetta_x

The Insane Statistician
OP
Member
Joined
Mar 4, 2010
Messages
1,844
Trophies
0
Age
34
XP
574
Country
United States
Read my last post, it's fairly evident.

QUOTE said:
I don't see why you're asking, though, it's not going to happen. Ever.

Just like gamecube USB loading?

This type of attitude is the failure of life. In a world with millions of people, in order to achieve something no one else has, you have to be different and think outside the box.

I wouldn't see your attitude as a negative thing, because it's comments like yours that gives people motivation to rub it in the face when it is done.

for separation of post

How many people have used a spoiler tag to separate between an edit? Clearly, the answer was yes, it is possible but not probable to download the entire internet and run it offline. But why would I ask, obviously it's not going to happen... oh yeah, and to even add extra emphasis... ever. The reason why I asked is because I have a laptop that I constantly use for travel. Imagine downloading even a small portion of wiki and using it as reference when I am in an area where I can't travel. What if I developed a search engine that goes through a site and downloads any pages related to microbiology. When these pages are found, I can use an active internet connection to download the information and use it at school in an offline environment?

Maybe at home, where my internet connection drops often. When it drops connection, I often have to wait 5-10 minutes to wait until it restores, sometimes even restarting the router. What if I made an extension where I can still view the webpage up to date in the last 5 minutes. I can continue to do research during this downtime...

If downloading the whole internet is possible, then by mathematical properties, it is possible to download any subset of the internet. Maybe one page or four, as long as you have the resources and if everyone who has replied is right, then it would be possible.
 

redact

‮҉
Member
Joined
Dec 2, 2007
Messages
3,161
Trophies
0
Location
-
XP
674
Country
Mauritania
i don't see how this would be possible unless the entire internet were consisted of static pages

dynamic pages (such as the one you are currently viewing) make your dreams of owning the internet impossible, not just implausible
 

Rydian

Resident Furvert™
Member
Joined
Feb 4, 2010
Messages
27,880
Trophies
0
Age
36
Location
Cave Entrance, Watching Cyan Write Letters
Website
rydian.net
XP
9,111
Country
United States
http://en.wikipedia.org/wiki/Wget
This can download recursively, meaning it'll follow links and download them as well and redirect the links to the local copies when you view and shit.

So yes, you could use wget to download the entire internet, time and space permitting.

However none of it would be interactive (outside of ajax/JS/flash) offline.

 

mysticwaterfall

Streamforce Supreme Commander
Member
Joined
Aug 11, 2008
Messages
1,874
Trophies
0
Location
Right behind you
XP
668
Country
United States
The only way such a system would ever work is if you set it up on a site by site basis and only had it do a small number of websites. A number of programs used to actually do this back in the day, they would cache the links on a website and then you could read them later, offline. The main point for it then was there was no broadband and a lot of people paying by the hour for internet. And it could only handle small numbers of pages at a time.

Of course, the difference is, back then websites were a lot different and there was little dynamic content. Even if it was practical to cache more then a few websites at a time now, they would be out of date almost instantly.

EDIT: Reading your edit, there certainly are offline dumps of wikipedia and the like, but you would still only be able to do it by specific webpages at a time and even then could only spider the links there. Having it amass everything on something like "Microbiology" constantly in the background while you work would be insanely impractical. There's a reason Google has massive server farms that do nothing but spider the web all day.
 

Zetta_x

The Insane Statistician
OP
Member
Joined
Mar 4, 2010
Messages
1,844
Trophies
0
Age
34
XP
574
Country
United States
Dynamic pages, stuff like java and flash, I know would be a limitation, maybe even a few work arounds but the majority of this stuff, I agree, would not be able to be obtained and ran offline. Some flash programs serve as a medium to load stuff off a server. Take Flash Flash Revolution for an example, when the site went down last year, it is possible to download all 2gigs of the engine, songs, charts, and play it offline.

However, there are some things like game faq pages, wiki pages, and other various applications where I can see it benefiting me especially in some places of my internet connection where dropped internet connections wouldn't be a burden anymore.

---

It would be possible to create an AHK script to copy entire dynamic webpages and store it in a file. But, how would the AHK script know what to look for, this is where ideas could come into place.

I create an autohotkey script, along with a thunderbird application, so I can send a text message to a certain email in a very specific format. The autohot key script would run some macros to insert an address I put into google maps, and send the directions to my cell.

QUOTE said:
EDIT: Reading your edit, there certainly are offline dumps of wikipedia and the like, but you would still only be able to do it by specific webpages at a time and even then could only spider the links there. Having it amass everything on something like "Microbiology" constantly in the background while you work would be insanely impractical. There's a reason Google has massive server farms that do nothing but spider the web all day.

Impractical today in some examples, it still has practical applications. It was mainly used as an example vs trying to establish a detail.

It would work like a giant RSS feed, you list which websites to feed off of, and it checks it and maybe compares it.


Thanks Rydian for that link, that is pretty much what I was looking for. Something similar to the but with combinations of a nice GUI, active transfer details, and an implementation with a browser.
 

Rydian

Resident Furvert™
Member
Joined
Feb 4, 2010
Messages
27,880
Trophies
0
Age
36
Location
Cave Entrance, Watching Cyan Write Letters
Website
rydian.net
XP
9,111
Country
United States
DownThemAll is an addon for firefox that can be used to save linked/embedded content on a page and crap, but it's not nearly as good as wget.

There's no tool to do what you want from within a browser because it doesn't make any sense and nobody's ever had to do it for a job or whatever.
tongue.gif


Check out wget's arguments and you might be surprised what can be done with it. Also write scripts that call it.
 

mameks

in memoriam of gravitas
Member
Joined
Jun 18, 2009
Messages
2,300
Trophies
0
Age
28
Location
Charlotte's maze
XP
545
Country
United Kingdom
this hard drive would be f'king massive. As in huge building/small city massive...we had a presentation made/given to us at school about it.
 

playallday

Group: GBAtemp Ghost
Member
Joined
May 23, 2008
Messages
3,767
Trophies
1
Location
[@N@[)@
Website
Visit site
XP
494
Country
Canada
pikachu945 said:
the internet itself could be more then 700 yottabytes I think it was 500 in 2009
WikiAs of 2010, no storage system has achieved one zettabyte of information. The combined space of all computer hard drives in the world does not amount to even one yottabyte, but was estimated at approximately 160 exabytes in 2006. As of 2009, the entire Internet was estimated to contain close to 500 exabytes.
 

pikachu945

Well-Known Member
Member
Joined
Sep 13, 2009
Messages
691
Trophies
0
XP
427
Country
Canada
Arctic said:
pikachu945 said:
the internet itself could be more then 700 yottabytes I think it was 500 in 2009
WikiAs of 2010, no storage system has achieved one zettabyte of information. The combined space of all computer hard drives in the world does not amount to even one yottabyte, but was estimated at approximately 160 exabytes in 2006. As of 2009, the entire Internet was estimated to contain close to 500 exabytes.

lol it was a joke and I got told
frown.gif
 

Site & Scene News

Popular threads in this forum

General chit-chat
Help Users
  • No one is chatting at the moment.
    ButterScott101 @ ButterScott101: +1