I'm missing something? If they are slow walking the data, why wouldn't they sample the data from each site. That is, rather than releasing ALL the data from some sites, and NONE of the data from other sites, they just release SOME of the data from ALL of the sites. It doesn't mean that there isn't more data to release later. They were only required to release 80,000 docs a month. It wasn't stated what algorithm they had to use in deciding what to release (was it?).
Releasing it this way (subset of data for each site) makes it easier to obfuscate the big picture.
I'm missing something? If they are slow walking the data, why wouldn't they sample the data from each site. That is, rather than releasing ALL the data from some sites, and NONE of the data from other sites, they just release SOME of the data from ALL of the sites. It doesn't mean that there isn't more data to release later. They were only required to release 80,000 docs a month. It wasn't stated what algorithm they had to use in deciding what to release (was it?).
Releasing it this way (subset of data for each site) makes it easier to obfuscate the big picture.
The only logical reason is that it's the best option for them at the moment which would have to mean the data not released is somehow worse.