It's 2:30 AM, so I may forget some details in these steps. I've spent all night researching and testing free tools that could accomplish this objective. Feel free to substitute whatever tools you prefer.
The instructions below are for Windows and Edge. You can use Chrome too.
These are the steps for downloading all the JFK documents. This is the easy part.
-
Create two Windows folders to store the PDF's. The first you can name "JFK Files." Within that folder, create another named "OCR."
-
Install the Chrono Download Manager extension for either Edge or Chrome:
Chrome: https://chromewebstore.google.com/detail/chrono-download-manager/mciiogijehkdemklbdcbfkefimifhecn
-
After the extension is installed, change your browser's download location to the "JFK Files" folder you just created. Chrono Downloader conveniently pops up a window with a link to change this along with another setting.
-
Open the browser and pin the Chrono sniffer to the toolbar. To do this in Edge, first click the Extensions button to the right of the URL bar. You'll see the Chrono Downloader extension with three little dots next to it. Click the dots, then click "Pin to toolbar." You'll then see the Chrono icon next to the URL bar. This is the Chrono Sniffer.
-
Go to the National Archives page for the JFK files. Where the dropdown says "Show 10 entries," change the dropdown to "Show All Entries." You want all 1,123 PDF links to be visible. Don't worry, the Archives website is FAST.
https://www.archives.gov/research/jfk/release-2025
- Click the Chrono Sniffer Icon. Click the Document tab. You will see a lot of links listed in the sniffer, so below that list, click the PDF filter. Only the PDF links will be highlighted and have a check mark, meaning only those files will be downloaded. Click the Download All button. This will start the download of all 1,123 files. It took me less than 10 minutes to complete all the downloads over a fiber connection.
You can view the download progress in the Chrono dashboard if you want, but you can also view it in your JFK Files folder.
After all the PDF's are downloaded, you'll want to enhance the files with OCR (Optical Character Recognition) so they can be searchable. If you don't have Adobe Acrobat, there's a free tool that can do this in bulk.
- Go to the PDF24 Creator site and download the latest Windows revision, 11.23.0. Note that this free application offers capabilities that other developers charge a lot of money for, like batch processing of PDF's.
https://tools.pdf24.org/en/creator
-
When you first launch the tool, you will need to register it by creating an account with your email and password (Booooo!), then you will need to register it using the code you received in your email.
-
After registration, when see the large menu of PDF options, click PDF OCR.
-
You'll see a page with many lines like a ledger. On the right, change the Output directory to the "OCR" folder you created within the "JFK Files" folder.
-
On the left, you can now click the Add Files button. Add as many PDF's as you want. For now, I'm doing 10 at a time, and tomorrow I'll test 50 at a time. I'm not sure if the software has a limit. Converting all 1,123 files should take a few hours.
Here's the beautiful part: Once you've enhanced all the PDF's with OCR, you can now search all of them through Windows explorer. To do this, click into the OCR folder and enter your search term in the Windows explorer search field. For example, if I enter "Oswald" in the search field, Windows will list every PDF that contains that word along with some preview text. So your OCR folder is now a database of declassified JFK files.
Alternative for Mac and Linux users:
Here is a script to download all the files using curl (should work on Linux and Mac as long as curl is installed)
https://pastebin.com/raw/jtNkkNWz
Bless you for this script.
TBH, its creation must have been a lot of tedium...
Not really. I used a macro on emacs. I have a motto for coding. "If it cant be done with emacs macros, you better be getting paid for doing it" !
As long as you're parsing it through the HTML, I agree that it wouldn't be too bad. Just the act of stripping away all the crud to get just the file names. Then "for line in file, append "curl phrase" . Just tedium...