The Privacy / AI Training battle has begun

posted 234 days ago by JackieDaytona74 234 days ago by JackieDaytona74 +28 / -0

I posted this comment in another important thread (https://greatawakening.win/p/17tKx1ETrh/urgent-please-act-quickly-on-thi/c/) but felt compelled to call this out in an effort to educate people on the broader battle here.

The linked post is in regards to the new iPhone / Open AI integration, but a similar issue is going on with Adobe. And you're going to see this battle happening with nearly every piece of software you own & use. Here's my comment from that thread:

======================================

This is a very important issue, but I would encourage everyone to be on guard for this situation in a far broader fashion.

The root of issues like the one above is that the #1 area of technology development right now for every major software corp is of course, AI.

AI NEEDS TO BE TRAINED and for that, you need data. Real data. And that means as much of YOUR data as possible.

Now consider that these companies are in a race against one another to build the best AI for their particular niche in the marketplace. That competitive pressure leads to a lot of short cuts and "looking the other way" when it comes to privacy and consumer rights.

This same battle is going on with Adobe right now whose new terms & conditions allow them access to all of your art (and more).

This is one of the new battlefields and I would encourage people in the strongest possible manner to get up to speed on this issue and how to fight it. Doesn't matter if you're non-technical, we need your voice.

We've been warned about AI for years, but now the rubber is hitting the road. We're hitting one of the first battlefields with this privacy fight.

11 comments

11 comments share save hide report block hide replies

Comments (11)

sorted by:

▲ 3 ▼

– MagnaCartaAgain 3 points 234 days ago +3 / -0

I never liked that Adobe went subscription based, and stopped you buying physical copies.

permalink save report block reply

▲ 1 ▼

– JackieDaytona74 [S] 1 point 234 days ago +1 / -0

Indeed. All the major software companies want customers (businesses or individuals) to adopt a subscription / OpEx model. Unfortunately, many businesses in their foolish rush to The Cloud have adopted an OpEx model for their accounting methods.

As a consequence, nobody owns anything any longer.

But the AI training has opened a far more disturbing element to this subscription software model that everyone seems to be willing to embrace.

permalink parent save report block reply

▲ 2 ▼

– Mountaingale 2 points 234 days ago +2 / -0

I hope the new Trump administration addresses this.

permalink save report block reply

▲ 0 ▼

– NOT_ADMIN 0 points 234 days ago +1 / -1

You don't need real data. Synthetic data works, many times it's better.

permalink save report block reply

▲ 0 ▼

– JackieDaytona74 [S] 0 points 234 days ago +1 / -1

Completely incorrect.

permalink parent save report block reply

▲ 0 ▼

– NOT_ADMIN 0 points 234 days ago +1 / -1

I am a machine learning data generation developer…

permalink parent save report block reply

▲ 0 ▼

– JackieDaytona74 [S] 0 points 234 days ago +1 / -1

And I work for a multi-billion dollar software company as a technical director.

There's no replacement for real data to train AI. For ML, that's a different story. But there's a reason that all of these companies are willing to take privacy risks to train their AI on real world data.

I don't know who in the hell you're working for, but they're apparently imbeciles. You can't generate data that can train AI as effectively as real data created by human beings.

permalink parent save report block reply

▲ 1 ▼

– NOT_ADMIN 1 point 234 days ago +1 / -0

Yes there is. In fact, many occasions, synthetic data is better. I work for an agency which creates Ai for fortune 100 companies and the US government. Maybe in 2016 synthetic data wasn't used...

permalink parent save report block reply

▲ 1 ▼

– JackieDaytona74 [S] 1 point 233 days ago +1 / -0

You're missing the forest for the trees. Of course there's a minority of circumstances where synthetic data can or is better, especially for very specific training goals or an absence of real world data.

But this isn't what major consumer and business software companies are doing with these platforms and that's where the privacy issues come into play.

permalink parent save report block reply

▲ 1 ▼

– NOT_ADMIN 1 point 233 days ago +1 / -0

I work with many fortune 100 companies, and the US government. Many companies need to do object recognition of their conveyor lines, it needs to be generalized to many factories not just one. Even when there is real world data, often times the models need to be generalized and that’s very very hard to do with real world data. I’m talking about domain randomization because you don’t want to over fit a model. Often times, the client can provide some small amount of real world data, I then create a digital twin of what they need to recognize but because it’s virtual we can produce and annotate 100000 images with perfect segmentation masks, and fully randomized domains like lighting, orientation, noise etc so that the training data is extremely high quality and the chance for overfitting is minimal. Even for border security, being able to track migrants very far away; a simulated environment was created that looked similar to the land they were targeting but because it was virtual, we could randomize weather, lighting, trees, landscape water etc allowing for a much better and more flexible model.