I posted this comment in another important thread (https://greatawakening.win/p/17tKx1ETrh/urgent-please-act-quickly-on-thi/c/) but felt compelled to call this out in an effort to educate people on the broader battle here.
The linked post is in regards to the new iPhone / Open AI integration, but a similar issue is going on with Adobe. And you're going to see this battle happening with nearly every piece of software you own & use. Here's my comment from that thread:
======================================
This is a very important issue, but I would encourage everyone to be on guard for this situation in a far broader fashion.
The root of issues like the one above is that the #1 area of technology development right now for every major software corp is of course, AI.
AI NEEDS TO BE TRAINED and for that, you need data. Real data. And that means as much of YOUR data as possible.
Now consider that these companies are in a race against one another to build the best AI for their particular niche in the marketplace. That competitive pressure leads to a lot of short cuts and "looking the other way" when it comes to privacy and consumer rights.
This same battle is going on with Adobe right now whose new terms & conditions allow them access to all of your art (and more).
This is one of the new battlefields and I would encourage people in the strongest possible manner to get up to speed on this issue and how to fight it. Doesn't matter if you're non-technical, we need your voice.
We've been warned about AI for years, but now the rubber is hitting the road. We're hitting one of the first battlefields with this privacy fight.
You're missing the forest for the trees. Of course there's a minority of circumstances where synthetic data can or is better, especially for very specific training goals or an absence of real world data.
But this isn't what major consumer and business software companies are doing with these platforms and that's where the privacy issues come into play.
I work with many fortune 100 companies, and the US government. Many companies need to do object recognition of their conveyor lines, it needs to be generalized to many factories not just one. Even when there is real world data, often times the models need to be generalized and that’s very very hard to do with real world data. I’m talking about domain randomization because you don’t want to over fit a model. Often times, the client can provide some small amount of real world data, I then create a digital twin of what they need to recognize but because it’s virtual we can produce and annotate 100000 images with perfect segmentation masks, and fully randomized domains like lighting, orientation, noise etc so that the training data is extremely high quality and the chance for overfitting is minimal. Even for border security, being able to track migrants very far away; a simulated environment was created that looked similar to the land they were targeting but because it was virtual, we could randomize weather, lighting, trees, landscape water etc allowing for a much better and more flexible model.