Okay so for the record I hope this gives us joy but want to make sure we don't get caught up in OCR buzzacronyms.
Ocr is just scanbing a document and there is some algorithm that finds text boxes on a pdf and tags them so the text is re encoded and is searchable. It's available on almost any scanner or printer that anyone can buy. My only point is the dedicate that OCR isn't some super secret tech term. It's just meaning that all ballots can be searchable by keyword searches, which would come in handy for the auditors.
No doubt for sure! It's still great news to document. I just wanted to be explicit for those who may see this and think it's some James bond technique. It's important to state this is good news bit also say it for what it is. Its a standard technique that is very useful for mass data gathering.
I don't think so. OCR, optical character recognition, is about turning a pixel region into text. My guess is that the ballot reader was reading the filled in bubbles, which I'm pretty sure would not use OCR. If the ballot is misaligned then where the scanner is looking for bubbles will most likely be off and thus the bubbles won't be read properly. Really nothing to do with OCR.
I would say yes, your understanding is wrong. My guess is that they are not using OCR for finding the alignment marks or the bubbles. The alignment marks, I assume, are used to figure out the correct region and location for the bubbles. OCR on the other hand is used to turn pixel images (images which contain text) into text. For instance, a scanned text based document.
So if you scanned a ballot and sent the scanned image to an OCR engine it would return the text that it found on the ballot image. All non-text would just be skipped over. Even some of the text might not be recognized depending on the quality of the image, for instance multiple times being faxed, and how much it might be skewed.
You'll notice on the text based captcha's they degrade the characters in hopes to make it such that a human can figure it out but an OCR engine could not.
My guess is that the text on the ballot is just to aid the person filling out the ballot. The ballot reader will most likely just be trying to find the two columns of bubbles and determining which ones are filled in. Let's say the left column is democrat and the right column is republican. The first row is presidential, second row first senate race, etc.
I think that the person that wrote this OP text, whoever they may be, are talking about the work that Jovan Pulitzer is doing at the AZ audit. He has the tech to forensically examine these hard copy ballots.
Now that I read this over again, and I had watched the video, I'm thinking that maybe the OCR is being used to determine which county/town the ballot is for. They of course could do this by hand but why not try to automate it. Plus they've already scanned all the ballots so they have the images, no need to go through all the ballots by hand again.
The guy in the video talked about there being 500+ different ballots in some cases for a state due to needing a different one for each county or town as you have different county/town races. So I guess the thinking goes, if someone took a single ballot and duplicated it say 10,000 times and submitted them, it would be more easily detectable as that town/county's percentage of eligible voters voting would be possibly much higher than 100%.
In this case you could use OCR to read either the names of the people in the races, if that's how they determine which of the 500+ ballots it is, or I'm guessing each of those 500+ ballots has a unique number which the OCR should be able to read.
In reality OCR it not about scanning a PDF. I mean you could scan a PDF, but OCR typically is about feeding an OCR engine with an image (pixel image) and having it return the text that it was able to decipher from the image.
Okay so for the record I hope this gives us joy but want to make sure we don't get caught up in OCR buzzacronyms.
Ocr is just scanbing a document and there is some algorithm that finds text boxes on a pdf and tags them so the text is re encoded and is searchable. It's available on almost any scanner or printer that anyone can buy. My only point is the dedicate that OCR isn't some super secret tech term. It's just meaning that all ballots can be searchable by keyword searches, which would come in handy for the auditors.
The examination of the scans could be good, but I'm waiting for the canvas
No doubt for sure! It's still great news to document. I just wanted to be explicit for those who may see this and think it's some James bond technique. It's important to state this is good news bit also say it for what it is. Its a standard technique that is very useful for mass data gathering.
I don't think so. OCR, optical character recognition, is about turning a pixel region into text. My guess is that the ballot reader was reading the filled in bubbles, which I'm pretty sure would not use OCR. If the ballot is misaligned then where the scanner is looking for bubbles will most likely be off and thus the bubbles won't be read properly. Really nothing to do with OCR.
I would say yes, your understanding is wrong. My guess is that they are not using OCR for finding the alignment marks or the bubbles. The alignment marks, I assume, are used to figure out the correct region and location for the bubbles. OCR on the other hand is used to turn pixel images (images which contain text) into text. For instance, a scanned text based document.
So if you scanned a ballot and sent the scanned image to an OCR engine it would return the text that it found on the ballot image. All non-text would just be skipped over. Even some of the text might not be recognized depending on the quality of the image, for instance multiple times being faxed, and how much it might be skewed.
You'll notice on the text based captcha's they degrade the characters in hopes to make it such that a human can figure it out but an OCR engine could not.
My guess is that the text on the ballot is just to aid the person filling out the ballot. The ballot reader will most likely just be trying to find the two columns of bubbles and determining which ones are filled in. Let's say the left column is democrat and the right column is republican. The first row is presidential, second row first senate race, etc.
I think that the person that wrote this OP text, whoever they may be, are talking about the work that Jovan Pulitzer is doing at the AZ audit. He has the tech to forensically examine these hard copy ballots.
Now that I read this over again, and I had watched the video, I'm thinking that maybe the OCR is being used to determine which county/town the ballot is for. They of course could do this by hand but why not try to automate it. Plus they've already scanned all the ballots so they have the images, no need to go through all the ballots by hand again.
The guy in the video talked about there being 500+ different ballots in some cases for a state due to needing a different one for each county or town as you have different county/town races. So I guess the thinking goes, if someone took a single ballot and duplicated it say 10,000 times and submitted them, it would be more easily detectable as that town/county's percentage of eligible voters voting would be possibly much higher than 100%.
In this case you could use OCR to read either the names of the people in the races, if that's how they determine which of the 500+ ballots it is, or I'm guessing each of those 500+ ballots has a unique number which the OCR should be able to read.
In reality OCR it not about scanning a PDF. I mean you could scan a PDF, but OCR typically is about feeding an OCR engine with an image (pixel image) and having it return the text that it was able to decipher from the image.