For those frogs who just couldn't stay on top of it (or who have a life outside the pond)!
Sorry about the re-post with this, i jumped the gun with my earlier post, but here I've finally completed scraping every single sticky from 2020.08 (GA.W start month) to 2023.03!
It might not be perfect, but i can always fix it.
Here is a question, did you actually provide gpt the entire repository for GAW or did it link sniff like the old porn batch downloaders used to in the mid 2000s?
I am worried that potentially, GPT has been able to mine data from every post. I know you said earlier you did it in python but were you able to scrape links only without it actually scraping data?
copy and paste. the only real code pasted to chatgpt was mostly its own code, with some personal weaving. it just makes the code, and i run and debug it in python3. i run the scraper with that code.
Ok, thank God. They have no data, I was afraid you did an old style scrape via gpt and it had our data.
Good work on your code, this resource will be invaluable for new anons which I am sure we will have thousands of in the coming days
don't worry, it was just my computer going to all the posts and scraping every user/time/title and dropping into a csv.