In order for Facebook IP ranges to be blackholed by a 'configuration error' the following conditions would have to be met/breached.
Background
-
A major network like the one Facebook has will peer with other network providers at multiple locations in multiple DC's
-
Large networks like this will have out of band (OOB) access to border routers (think modem via a mobile phone link)
-
Large companies with critical assets have very strict change control procedures
Change control is a key area to understand with regards to this outage. Most people might not be aware, but it is often a quite involved process that incorporates a number of checks and failsafes.
Consider a 'normal' change for a large company.
-
Someone in the business decides that a change needs to be made for whatever reason, so they raise a change request.
-
This initial change request will often be high level, and a business approver will have to sign it off as required and worth the risk
-
Technical engineers will create a more detailed change request, this will include such things as
-
Details of what is going to be changed and why, and on what equipment
-
Identify if the change poses a risk to critical services
-
A detailed change script of what actual changes will be performed, including a backout plan should the change fail
-
Key services will be identified before hand and a test plan put in place to occur prior to the change (to make sure it's all ok) and a re-run of the tests after the change has been made.
-
Another engineer will peer review the detailed change request and provide technical approval, or push back on things that might be wrong or unclear. They also provide assurance that the technical changes being made will actually achieve the stated business goals
-
Once all this is done the change will go to a final approval team who have a 'big picture' view and can juggle changes between various Data Centers and ensure there are no overlaps between different change requests that could cause unexpected issues.
Once all this is approved the change will be scheduled for out of hours change, depending on which timezone the relevant DC is in.
It is highly unlikely changes of this nature would be made at all of Facebooks datacenters at the same time..
https://www.datacenters.com/facebook-data-center-locations
And certainly not within business hours.
Even assuming all this was done, the person conducting the change would be using the out of band connectivity to perform it. This is done so that if you make a mistake or there is an undocumented bug in the IOS code (it happens) then they are not kicked off the device and can still remediate the problem.
The very fact that engineers could not get into the building to fix the problem once it started means that there was no-one actually making the change.
The above procedure is typical for a large enterprise with public facing critical assets - Facebook's policies are likely to be even tighter.
TL;DR - In order for this 'mis-configuration' to be a thing, all of the checks would have to have missed the potential issue, the change would have to be implemented simulataneously in all of Facebooks datacenters where they peer with other internet providers - and all by some kind of automated system where no human had any oversight during normal business hours.
It simply isn't feasable.
UPDATE: Here is more information on the 'official' version..
https://www.theregister.com/2021/10/06/facebook_outage_explained_in_detail/
I appreciate this analysis, thank you!
You're welcome :)
What do you think did happen?
There are many theories, but I suppose my instincts tell me it was a white-hat operation.
A shot across the bows to make them aware that the Cabal aren't the only ones with power over them.
This whistleblower thing is obviously a setup by FB to be able to get away with more censorship. The white hats would have gotten wind of it and given them a warning.
The fact they are going ahead with the charade means they are ignoring the warning. There are things you can do to prevent this kind of attack on your network if you are prepared for less flexibility, but I expect the wh's to anticipate that response and have another card to play.
Expect the next outage to be considerably more impacting.
During this warning, everyone else would get an idea of what services they are running that are crucial to their operation that relies on FB's network to some extent. I think they will all be given a short amount of time to remove that risk so that when FB gets taken down for an extended period it won't break all their systems.
I don't have inside knowledge, but I do have a lot of experience troubleshooting and root cause analysis.
There are other scenario's that would fit the bill, one of which is that FB's change system is automated and they have idiots working for them, but I don't put much stock in that one.
I agree that this was a white hat operation. The BGP affected DNS which is why we can't open the security door locks.....really?? who runs internal DNS exposed to the world?? What company doesn't run their own internal DNS, and infrastructure? Also based on reports I've heard, not every place had doors that locked people out, it seems as though that was here in the US. BGP isn't something that is adjusted manually all the time, once it it set, you tend to leave it alone. It would be like changing your phone number, you change that and no one can get ahold of you.
@ u/JonathanE Your initial post was spot on. I will add one thing, piss on change request groups made of non-technical people, talk about holding up fixes, upgrades, and replacements....argh.
I still wonder of the "lightning stike" of the GF memorial in Ohio was a message to the BLM/Antifa peon leadership to cease and desist with their planned chaos... and if things like the recent NYC glowie raid was retaliation for the FB shut down???
Space Force jammed their network.
Agreed my gut tells me the same, Hell of a idea, WH simply could delete FB,Twitter,Google, etc... stocks would plummet overnight billions lost.