I appreciate your effort to clear your way through the unbelievably obtuse and lengthy presentation Solomon provides in attempt to distill it down to a salient understanding. Please be aware I have done the same and largely agree with your description above regarding what he did. I have been analyzing the original data in depth since the election and again, I was the one who actually provided the data to Solomon for Philly.
With that said, you have not yet addressed my main criticism involving the fundamental tacit assumption that underlies Solomon’s primary claim of proving fraud. We may now finally be to the point where you can appreciate the full force of this, so let me give it one more try.
My references to a uniform probability distribution are to what you call a natural dataset. Solomon makes a uniform probabilistic assumption on this set when he refers to hitting thin lines on a dartboard, similar to the wheel. This assumption is flawed in the sense that it does not adequately reflect how precinct-level ballot ratios are stochastically generated in time, even in non-fraudulent cases.
Here’s an example: Consider bus riders from a weekday in a busy city in pre-covid days, and suppose we have complete rider and bus records for a full day. To relate this to Solomon’s setup, riders are precincts and bus seats are specific ballot ratios. We can ask, what are the chances that rider A gets off at a specific stop at a certain time and rider B gets on and takes the exact same seat rider A was just in? This represents, by analogy, a ratio transfer. Under Solomon’s assumption, all such probabilities would be extremely small given the number of riders and available bus seats for that day. But hold on, many riders have a regular work schedule and are also creatures of habit, so in many cases the transfer probabilities end up being much larger. The observed distribution of seat occupation is surely far from being equal across the seats. To accurately compute ratio transfer probabilities, one would need to thoroughly assess the riding patterns over numerous days to determine a reasonable reference distribution for what is natural in that city. In this case, as in the voting scenario, this is would differ substantially from a uniform distribution.
Carrying this example further, suppose a group of thugs boards a few of the buses throughout the day and forces riders to sit in certain seats. Literally seizing and releasing them as Solomon describes. From the data alone, how would we determine which buses the thugs boarded and which people they accosted? Again, we would first need to know what the natural distribution of seat occupation looks like for a typical weekday in that city and then look for anomalies. This would be very difficult to determine precisely from data from only a single day due to confounding of various factors.
As mentioned previously, comparing to Iowa 2016 is weak at best. We would ideally need precinct-level time-series data for counties similar to Fulton and Philadelphia in 2020. For example, comparing to Fulton to Cobb, DeKalb, and Gwinnett in Georgia is reasonable, but that is still only three counties and they were not counted collectively in State Farm Arena as was Fulton so the dynamics are different. It is much more straightforward and convincing to do graphical and statistical comparisons like those in Chapters 2 and 6 of https://www.scribd.com/document/495988109/MITRE-Election-Report-Critique .
To summarize, Solomon’s primary conclusions ride on a highly obfuscated probabilistic assumption that appears to be a poor approximation to realities of the 2020 election. I believe his techniques are capable of finding and suggesting some potentially fraudulent instances, but this is a very long way from 100% mathematical proof. I’d encourage you to think more deeply about this and carefully reread my replies in this thread. Researching more general topics like https://en.wikipedia.org/wiki/Statistical_proof and https://en.wikipedia.org/wiki/Probabilistic_logic may also help.
One final technical correction: The 3-decimal rounding errors are much larger than 1 or 2 votes. The Edison/NYT json files report a cumulative grand total of votes and then 3-decimal fractions for each candidate. As soon as the grand totals are greater than 1000 then rounding errors begin and grow increasingly worse as both precinct size increases and as time goes along. Such errors are substantial in the larger precincts in both Fulton and Philadelphia counties and also involve third-party candidates. The deltas that Solomon analyzes are far from exact, adding another questionable aspect to his entire analysis.
I appreciate your effort to clear your way through the unbelievably obtuse and lengthy presentation Solomon provides in attempt to distill it down to a salient understanding. Please be aware I have done the same and largely agree with your description above regarding what he did. I have been analyzing the original data in depth since the election and again, I was the one who actually provided the data to Solomon for Philly.
With that said, you have not yet addressed my main criticism involving the fundamental tacit assumption that underlies Solomon’s primary claim of proving fraud. We may now finally be to the point where you can appreciate the full force of this, so let me give it one more try.
My references to a uniform probability distribution are to what you call a natural dataset. Solomon makes a uniform probabilistic assumption on this set when he refers to hitting thin lines on a dartboard, similar to the wheel. This assumption is flawed in the sense that it does not adequately reflect how precinct-level ballot ratios are stochastically generated in time, even in non-fraudulent cases.
Here’s an example: Consider bus riders from a weekday in a busy city in pre-covid days, and suppose we have complete rider and bus records for a full day. To relate this to Solomon’s setup, riders are precincts and bus seats are specific ballot ratios. We can ask, what are the chances that rider A gets off at a specific stop at a certain time and rider B gets on and takes the exact same seat rider A was just in? This represents, by analogy, a ratio transfer. Under Solomon’s assumption, all such probabilities would be extremely small given the number of riders and available bus seats for that day. But hold on, many riders have a regular work schedule and are also creatures of habit, so in many cases the transfer probabilities end up being much larger. The observed distribution of seat occupation is surely far from being equal across the seats. To accurately compute ratio transfer probabilities, one would need to thoroughly assess the riding patterns over numerous days to determine a reasonable reference distribution for what is natural in that city. In this case, as in the voting scenario, this is would differ substantially from a uniform distribution.
Carrying this example further, suppose a group of thugs boards a few of the buses throughout the day and forces riders to sit in certain seats. Literally seizing and releasing them as Solomon describes. From the data alone, how would we determine which buses the thugs boarded and which people they accosted? Again, we would first need to know what the natural distribution of seat occupation looks like for a typical weekday in that city and then look for anomalies. This would be very difficult to determine precisely from data from only a single day due to confounding of various factors.
As mentioned previously, comparing to Iowa 2016 is weak at best. We would ideally need precinct-level time-series data for counties similar to Fulton and Philadelphia in 2020. For example, comparing to Fulton to Cobb, DeKalb, and Gwinnett in Georgia is reasonable, but that is still only three counties and they were not counted collectively in State Farm Arena as was Fulton so the dynamics are different. It is much more straightforward and convincing to do graphical and statistical comparisons like those in Chapters 2 and 6 of https://www.scribd.com/document/495988109/MITRE-Election-Report-Critique .
To summarize, Solomon’s primary conclusions ride on a highly obfuscated probabilistic assumption that appears to be a poor approximation to realities of the 2020 election. I believe his techniques are capable of finding and suggesting some potentially fraudulent instances, but this is a very long way from 100% mathematical proof. I’d encourage you to think more deeply about this and carefully reread my replies in this thread. Researching more general topics like https://en.wikipedia.org/wiki/Statistical_proof and https://en.wikipedia.org/wiki/Probabilistic_logic may also help.
One final technical correction: The 3-decimal rounding errors are much larger than 1 or 2 votes. The Edison/NYT json files report a cumulative grand total of votes and then 3-decimal fractions for each candidate. As soon as the grand totals are greater than 1000 then rounding errors begin and grow increasingly worse as both precinct size increases and as time goes along. Such errors are substantial in the larger precincts in both Fulton and Philadelphia counties and also involve third-party candidates. The deltas that Solomon analyzes are far from exact, adding another questionable aspect to his entire analysis.