Did he just say under his breath fox news couldn't be a weather channel because they can't report a storm coming????? or was that just me
You're viewing a single comment thread. View all comments, or full comment thread.
Comments (15)
sorted by:
Yeah I'll definitely take a look then. Unfortunately my program won't let me open the file and check until the recording is done as the file is locked, so I'll have to check later today once everything's over.
I try to archive as much as I can as it feels like everything disappears rapidly nowadays. Don't know how tech savvy most people are here, but I use the following tools if anyone's interested.
Jdownloader - https://jdownloader.org/ (cool downloading program to bulk download files. You can paste a url and do a deep scan and it'll hit every download on that page. Also can do youtube videos, and quite a few other website's videos.)
Archivebox - https://archivebox.io/ (A little more involved to get set up. I have this running on a virtual machine. It gives a nice local web interface where you can paste in webpage URLs, and it'll save them in multiple formats including .html, .pdf, an image, etc. It can also do deep scanning which is downloading multiple URLs at once. It has a built in means of viewing saved files, but it's search can be slow if you have thousands of websites saved. It can also be automated/scripted if you're more savvy, ie automatically downloading a specific URL every day, etc.)
Sist2 search - https://github.com/simon987/sist2 (Not an archiving tool per se, but good search indexing of local files. It combines with the above archivebox really well. I have thousands of URLs saved, and I can do full text searching throughout them all in a matter of seconds. You can set up multiple databases, ie web-archives, media, books/pdfs, etc, and choose which ones to search your desired term in. It takes my system about 15 seconds to scan through 15+ TB of documents, full text so it'll catch a sentence buried in a book pdf for example.)
youtube-dl - https://github.com/ytdl-org/youtube-dl (Fairly well known, not too difficult to set up. Will download youtube videos, channels, playlists, etc, along with videos on some but not all other sites. Very efficient, and can be set up to include extra data such as subtitles, download audio only, or many other features. Also very scriptable if you're familiar with command line/basic scripting. Takes like 20 minutes to set up a daily backup of a youtube channel you care about even if you have minimal scripting knowledge.)
Thought I'd also drop this website here (the-eye.eu). This site has all sorts of free data (they do take donations). Some data is hard to get elsewhere, and you can either download data bit by bit, or all at once using their built in "wget" commands at the bottom of the page. They don't throttle (other than only letting you download like 3x files at a time). They have a 10gbit pipe, so you can consistently download hundreds of gigabytes per day. They have books/obscure media/website archives and more.