There is no way Facebook outage was a mis-configuration error - The Great Awakening

There is no way Facebook outage was a mis-configuration error 🏳️ FALSE FLAG 👮

posted 3 years ago by JonathanE 3 years ago by JonathanE +354 / -0

In order for Facebook IP ranges to be blackholed by a 'configuration error' the following conditions would have to be met/breached.

Background

A major network like the one Facebook has will peer with other network providers at multiple locations in multiple DC's
Large networks like this will have out of band (OOB) access to border routers (think modem via a mobile phone link)
Large companies with critical assets have very strict change control procedures

Change control is a key area to understand with regards to this outage. Most people might not be aware, but it is often a quite involved process that incorporates a number of checks and failsafes.

Consider a 'normal' change for a large company.

Someone in the business decides that a change needs to be made for whatever reason, so they raise a change request.
This initial change request will often be high level, and a business approver will have to sign it off as required and worth the risk
Technical engineers will create a more detailed change request, this will include such things as

Details of what is going to be changed and why, and on what equipment
Identify if the change poses a risk to critical services
A detailed change script of what actual changes will be performed, including a backout plan should the change fail
Key services will be identified before hand and a test plan put in place to occur prior to the change (to make sure it's all ok) and a re-run of the tests after the change has been made.

Another engineer will peer review the detailed change request and provide technical approval, or push back on things that might be wrong or unclear. They also provide assurance that the technical changes being made will actually achieve the stated business goals
Once all this is done the change will go to a final approval team who have a 'big picture' view and can juggle changes between various Data Centers and ensure there are no overlaps between different change requests that could cause unexpected issues.

Once all this is approved the change will be scheduled for out of hours change, depending on which timezone the relevant DC is in.

It is highly unlikely changes of this nature would be made at all of Facebooks datacenters at the same time..

https://www.datacenters.com/facebook-data-center-locations

And certainly not within business hours.

Even assuming all this was done, the person conducting the change would be using the out of band connectivity to perform it. This is done so that if you make a mistake or there is an undocumented bug in the IOS code (it happens) then they are not kicked off the device and can still remediate the problem.

The very fact that engineers could not get into the building to fix the problem once it started means that there was no-one actually making the change.

The above procedure is typical for a large enterprise with public facing critical assets - Facebook's policies are likely to be even tighter.

TL;DR - In order for this 'mis-configuration' to be a thing, all of the checks would have to have missed the potential issue, the change would have to be implemented simulataneously in all of Facebooks datacenters where they peer with other internet providers - and all by some kind of automated system where no human had any oversight during normal business hours.

It simply isn't feasable.

UPDATE: Here is more information on the 'official' version..

https://www.theregister.com/2021/10/06/facebook_outage_explained_in_detail/

77 comments

77 comments share save hide report block hide replies

Comments (77)

sorted by:

▲ 25 ▼

– WhyAserverWasBuilt 25 points 3 years ago +25 / -0

Tech is not my thing but thank you so much for this analysis

permalink save report block reply

▲ 29 ▼

– JonathanE [S] 29 points 3 years ago +29 / -0

Which is precisely why I decided to write it :) Glad it helps.

permalink parent save report block reply

▲ 6 ▼

– y000danon 6 points 3 years ago +6 / -0

They flat out unplugged.

No DNS A Records existed, no IPs routed - they flat out went dark.

permalink parent save report block reply

▲ 4 ▼

– BatteryBaron 4 points 3 years ago +4 / -0

"Should the lights go out, know that patriots are now in control"

permalink parent save report block reply

▲ 7 ▼

– y000danon 7 points 3 years ago +7 / -0

I'm not convinced. I half think Facebook planned this whole Whistle-Glower thing and the outage to maximize buzz.

"We would love to make changes - you saw us go down! we need more federal monies etc to make improvements etc etc"

permalink parent save report block reply

▲ 3 ▼

– ILearnedToCode 3 points 3 years ago +3 / -0

Do you think zuck planned on losing 7 billion dollars?

permalink parent save report block reply

▲ 10 ▼

– y000danon 10 points 3 years ago +10 / -0

I mean for him what's the real cost?

Last we knew he was worth $122 billion officially so he lost ~5% of what he's worth that we know about.

He prob has mad cash stashed and any of us could have lived off a fraction of that the rest of our lives.

I really doubt he gives 2 shits about $7 billion.

https://www.benzinga.com/sec/insider-trades/fb/mark-zuckerberg

He's dumped Half a billion in FB over the past month, though. Seems like maybe he knew this was coming. He's basically maxing out what he can sell every day.

Look at the stock price. Started at $377 and dropped to $342 over a month That's a %10 drop. Loosing %10 to these cats is HUGE.

He's willingly sold 2.8 Billion in one day btw.

Largest sale of shares was 41,350,000 units, worth over $2.28B on December 26, 2013

permalink parent save report block reply

▲ 2 ▼

– fazzman23 2 points 3 years ago +2 / -0

same here. thanks. care to speculate WHO did it?

permalink parent save report block reply

▲ 2 ▼

– JonathanE [S] 2 points 3 years ago +2 / -0

Not really, I just dn't have enough info.

permalink parent save report block reply

▲ 13 ▼

– deleted 13 points 3 years ago +13 / -0

▲ 8 ▼

– JonathanE [S] 8 points 3 years ago +8 / -0

I've worked on live system migrations for a major retailer on the run-up to Christmas - I know how to make things not break.

There is always room for the unexpected in system changes - but they tend to be limited one locale - even if there is a cascade failure (which I have experienced) it was fixed in less time with less resources than Facebook have at their disposal.

permalink parent save report block reply

▲ 7 ▼

– deleted 7 points 3 years ago +7 / -0

▲ 3 ▼

– CanDrWatson 3 points 3 years ago +4 / -1

Inside job or military grade hack are my two main theories on how a fuckup this big could happen with a simultaneous lockout.

permalink parent save report block reply

▲ 2 ▼

– Thwok 2 points 3 years ago +2 / -0

Not a fuckup.

The cover up is a fuckup :)

permalink parent save report block reply

▲ 12 ▼

– propertyofUniverse 12 points 3 years ago +12 / -0

I appreciate this analysis, thank you!

permalink save report block reply

▲ 10 ▼

– JonathanE [S] 10 points 3 years ago +10 / -0

You're welcome :)

permalink parent save report block reply

▲ 9 ▼

– amarQ144 9 points 3 years ago +9 / -0

What do you think did happen?

permalink parent save report block reply

▲ 17 ▼

– JonathanE [S] 17 points 3 years ago +17 / -0

There are many theories, but I suppose my instincts tell me it was a white-hat operation.

A shot across the bows to make them aware that the Cabal aren't the only ones with power over them.

This whistleblower thing is obviously a setup by FB to be able to get away with more censorship. The white hats would have gotten wind of it and given them a warning.

The fact they are going ahead with the charade means they are ignoring the warning. There are things you can do to prevent this kind of attack on your network if you are prepared for less flexibility, but I expect the wh's to anticipate that response and have another card to play.

Expect the next outage to be considerably more impacting.

During this warning, everyone else would get an idea of what services they are running that are crucial to their operation that relies on FB's network to some extent. I think they will all be given a short amount of time to remove that risk so that when FB gets taken down for an extended period it won't break all their systems.

I don't have inside knowledge, but I do have a lot of experience troubleshooting and root cause analysis.

There are other scenario's that would fit the bill, one of which is that FB's change system is automated and they have idiots working for them, but I don't put much stock in that one.

permalink parent save report block reply

▲ 6 ▼

– Choctaw 6 points 3 years ago +6 / -0

I agree that this was a white hat operation. The BGP affected DNS which is why we can't open the security door locks.....really?? who runs internal DNS exposed to the world?? What company doesn't run their own internal DNS, and infrastructure? Also based on reports I've heard, not every place had doors that locked people out, it seems as though that was here in the US. BGP isn't something that is adjusted manually all the time, once it it set, you tend to leave it alone. It would be like changing your phone number, you change that and no one can get ahold of you.

@ u/JonathanE Your initial post was spot on. I will add one thing, piss on change request groups made of non-technical people, talk about holding up fixes, upgrades, and replacements....argh.

permalink parent save report block reply

▲ 6 ▼

– JonathanE [S] 6 points 3 years ago +6 / -0

"piss on change request groups made of non-technical people"

I hear you Fren :)

permalink parent save report block reply

▲ 1 ▼

– Thwok 1 point 3 years ago +1 / -0

Note that all machines withdrew at the same time, not changed. Big difference. And not an honest mistake.

Just like using decimal values for votes, this excuse stinks to high heaven.

permalink parent save report block reply

▲ 3 ▼

– deleted 3 points 3 years ago +3 / -0

▲ 6 ▼

– yldngo 6 points 3 years ago +6 / -0

I still wonder of the "lightning stike" of the GF memorial in Ohio was a message to the BLM/Antifa peon leadership to cease and desist with their planned chaos... and if things like the recent NYC glowie raid was retaliation for the FB shut down???

permalink parent save report block reply

▲ 4 ▼

– blacksmith21 4 points 3 years ago +4 / -0

Space Force jammed their network.

permalink parent save report block reply

▲ 1 ▼

– Keepwatukill 1 point 3 years ago +1 / -0

Raspberry.... You know I hate raspberry!

permalink parent save report block reply

▲ 3 ▼

– MoistBaklava 3 points 3 years ago +3 / -0

Agreed my gut tells me the same, Hell of a idea, WH simply could delete FB,Twitter,Google, etc... stocks would plummet overnight billions lost.

permalink parent save report block reply

▲ 12 ▼

– militarysnoopy 12 points 3 years ago +12 / -0

Outage

Being hacked

Whistleblower going from 60 minutes to congress in 36 hours

Calls for stricter guidelines/control over the internet

They're all related. This is their next plan. Control of information. Control of the internet.

permalink save report block reply

▲ 8 ▼

– 1Markseeker 8 points 3 years ago +8 / -0

"Hey, you guys ready for the Cyber Polygon Pandemic?"

permalink parent save report block reply

▲ 3 ▼

– SemperSupra 3 points 3 years ago +3 / -0

They'll do anything to stay in power. Anything. Panic.

permalink parent save report block reply

▲ 9 ▼

– prunch88 9 points 3 years ago +10 / -1

The "Locked out" thing is also a giveaway ..... unless they have some hyper strict tech based entry that cant be opened by on site security. If they had security with access (There really has to be for fires etc), the employees locked out event would have been 10 minutes delay and not made the news.

permalink save report block reply

▲ 1 ▼

– dty6 1 point 3 years ago +1 / -0

That's what made me think of asset seizure.

permalink parent save report block reply

▲ 2 ▼

– Thwok 2 points 3 years ago +2 / -0

Yup. They were definately locked out.

Just not by their own side (as they would have you believe).

🍿

permalink parent save report block reply

▲ 2 ▼

– dty6 2 points 3 years ago +2 / -0

Right? That's what the bank does when they take your business ... they come along and lock the door! The employees can't get in!

We're only seeing 10% of what is really going on. 🍿

permalink parent save report block reply

▲ 9 ▼

– JonathanE [S] 9 points 3 years ago +9 / -0

Thanks to the mods for my first sticky, and for putting the proper flair on it :)

permalink save report block reply

▲ 6 ▼

– SemperSupra 6 points 3 years ago +6 / -0

Congrats on your first!

permalink parent save report block reply

▲ 6 ▼

– Unsilent 6 points 3 years ago +7 / -1

Not only all that, on the same day and about the same time, T-mobile, Verizon, and AT&T were having cell service outages in many major cities.

https://metro.co.uk/2021/10/04/verizon-att-t-mobile-outages-as-facebook-instagram-whatsapp-down-15363402/?ico=more_text_links

permalink save report block reply

▲ 10 ▼

– JonathanE [S] 10 points 3 years ago +10 / -0

I looked at those at the time, the numbers were several factors smaller than the FB reports so I figured it was clueless idiots thinking that 'Facebook' was the internet and complaining to their provider accordingly.

permalink parent save report block reply

▲ 2 ▼

– Elseebee 2 points 3 years ago +2 / -0

"The website is down"

https://hooktube.com/watch?v=W8_Kfjo3VjU

permalink parent save report block reply

▲ 5 ▼

– deleted 5 points 3 years ago +5 / -0

▲ 5 ▼

– Elseebee 5 points 3 years ago +5 / -0

Tech anon here:

This is an excellent breakdown. Just a couple points.

OOB connections are still ran to DC's via POTS (telephone, dial up modem) for just a contingency. The engineer would not have to physically be on-site to make a change via OOB, so the FB statement that they couldn't get in to fix the issue is sus.

The notion that FB would not have an OOB (or similar failsafe) network (likely multiple) operating at each DC is absurd. They surely do. What happened was outside of FB internal infra, or from within, and out of their control.

Business hours are different across the globe. FB traffic may, actually be lowest on mid-day Monday. That said, OP is right. Something like this would have been patched in phases to hit their low traffic times in each timezone, and to mitigate potential, unknown risks.

If FB's internal processes were completely broken down, an outage still would not cause this effect.

It reeks of an attack by a sophisticated, (likely state) actor.

permalink save report block reply

▲ 1 ▼

– Thwok 1 point 3 years ago +1 / -0

If this was a mistake, the routers would have been announcing their new address.

Since this was an attack, the routers totally withdrew.

Big difference. The internet is designed to self-heal. The simplicity of this attack screams of an ingenious penetration ... and the "locking them out of their own buildings" is just ironic icing on the cake.

permalink parent save report block reply

▲ 4 ▼

– blacksmith21 4 points 3 years ago +4 / -0

Nice write-up. Thanks.

permalink save report block reply

▲ 4 ▼

– Rooks 4 points 3 years ago +4 / -0

Lets be realistic. The ones who clearly have the authority, ability, and power to do this includes someone who's name starts with an S, and ends with pace Force.

permalink save report block reply

▲ 4 ▼

– JustSayIt 4 points 3 years ago +4 / -0

Cloudflare broke down what they observed happen in this blog post they made while Facebook was still offline.

Interesting to note that IP blocks for Facebook's DNS servers which were withdrawn were routed to US data centers. Apparently Facebook hosts their DNS servers on AWS instead of in their own data centers, so the routes would have been pointing to Amazon (unconfirmed).

permalink save report block reply

▲ 1 ▼

– scyenceFiction 1 point 3 years ago +1 / -0

Small point that makes all the difference: routes were not withdrawn. The entire AS disappeared.
The first is a normal consequence of a route or router failure. The second is never normal.

permalink parent save report block reply

▲ 0 ▼

– JustSayIt 0 points 3 years ago +1 / -1

Are you sure the entire AS disappeared? That Cloudflare blog post shows that other IP blocks on AS32934 were still routed at the time.

I also did see a timelapse of many of their BGP routes being withdrawn over time. It's possible that Cloudflare saw those blocks mentioned in the post as still routed at the time and they too went down... But if the entire AS disappeared, why didn't all the routes go down at the same time?

permalink parent save report block reply

▲ 3 ▼

– Great05 3 points 3 years ago +3 / -0

Whatever they tell us I don't believe it.

permalink save report block reply

▲ 3 ▼

– deleted 3 points 3 years ago +3 / -0

▲ 4 ▼

– JonathanE [S] 4 points 3 years ago +4 / -0

Follow TheRegister link and that will explain how that part occurred.

permalink parent save report block reply

▲ 3 ▼

– Vapourface 3 points 3 years ago +3 / -0

The real key as you mentioned for implementing critical change is the "rollback" or backout plan as you called it.

In car terms, if you are going to upgrade your turbo, you do not throw away your old working turbo until you check the new one works, even then you keep it around a while just in case you need to rollback the change. Impossible they would make a change without having a backup config of the old known working version to rollback to should the upgrade fail.

So, it was either deliberate by them or deliberate by others, but cannot be accidental by them.

permalink save report block reply

▲ 3 ▼

– SOTUisFUBAR 3 points 3 years ago +3 / -0

You've put quite a bit more thought into than I have, but I'd concluded as well it's fishy for FB to be down for such a time as it was.

My knowledge is no where near the level they're setup but I know a little somethin' somethin'. I also know if I'd been on the job and that happened, I'd be in my current position, unemployed lol.

permalink save report block reply

▲ 2 ▼

– tool 2 points 3 years ago +2 / -0

Absolutely. No fucking way this was caused by a „mis-configuration“. Changes at this level are as detailed as can be and checked multiple times by many eyes.

permalink save report block reply

▲ 2 ▼

– Choctaw 2 points 3 years ago +2 / -0

It is going to go this route for whom to blame

permalink save report block reply

▲ 2 ▼

– Bigfred 2 points 3 years ago +2 / -0

High effort & informative. Thanks.

permalink save report block reply

▲ 3 ▼

– JonathanE [S] 3 points 3 years ago +3 / -0

Thanks for reading, glad it helped

permalink parent save report block reply

▲ 2 ▼

– mac1221 2 points 3 years ago +2 / -0

Awesome post for those of us not in the wheelhouse. Thanks.

permalink save report block reply

▲ 2 ▼

– MatildaJ 2 points 3 years ago +2 / -0

Q has told us these social media accounts they are not immune. Qdrop#571. I believe it was WHs letting them know the Storm has the means of bringing it all down around them. Remember what 45 did only a couple of days before? He took his Twitter ban to the Federal Court. They have all been warned and now let’s see how they respond.

permalink save report block reply

▲ 2 ▼

– deleted 2 points 3 years ago +2 / -0

▲ 2 ▼

– debacle 2 points 3 years ago +2 / -0

They were losing a million dollars every few minutes from being down. There's no way their BGP configuration was airgrapped to the degree that a resource couldn't get physical access to the network within at most an hour.

Their response on social media was also very atypical of a tech company, and there was almost no presence from employees (anonymous or not).

Something happened on Monday. What, we will never know.

permalink save report block reply

▲ 2 ▼

– kek-m8 2 points 3 years ago +2 / -0

So sorta like… These are not the droids you are looking for.

permalink save report block reply

▲ 2 ▼

– january20 2 points 3 years ago +2 / -0

I know when I set up my small rinky-dink website (which is no longer active) it was hit-or-miss being able to pull it up until the DNS information populated on various DNS servers. So, to have a very large website not be reachable at all was definitely not due to a simple error.

Thanks for the explanation, even though it made my eyes glaze over a bit.

permalink save report block reply

▲ 2 ▼

– deleted 2 points 3 years ago +2 / -0

▲ 2 ▼

– 007wannabee 2 points 3 years ago +2 / -0

Great writeup.

Except for - "And certainly not within business hours."

What does that mean? Isn't it always business hours for Facebook somewhere in the world?

permalink save report block reply

▲ 2 ▼

– JonathanE [S] 2 points 3 years ago +2 / -0

It refers to where the services are, or where the main users of the services are.

permalink parent save report block reply

▲ 1 ▼

– Pm_me_anything_fun 1 point 3 years ago +1 / -0

Business hours?

permalink save report block reply

▲ 1 ▼

– Baltic19 1 point 3 years ago +1 / -0

Yes - change control - but there several possibilities why it is not fool proof. The first one I can think of is that it was an emergency in which case there are protocols in place to bypass the usual testing and review process and implement with emergency authorization. The 2 things I find most curious is the time of day this change was done (not off hours) and the amount of time for rolling back the changes. Now, it there was some “emergency “ such as deterring hacking - perhaps rollback was not an option. Also scheduling was not an option.

permalink save report block reply

▲ 1 ▼

– S11houette 1 point 3 years ago +1 / -0

Oh it's very possible. It just requires extreme incompetence. What happens when you fire your intelligent staff who refuse to vaccine and hire based on skin color?

They won't follow the procedures left for them if they even know they exist.

permalink save report block reply

▲ 1 ▼

– NeedMoarPillows 1 point 3 years ago +1 / -0

So I work in tech, and I think you're forgetting just how cobbled together all of this stuff is. People don't realize there is a huge collapse around the corner until it happens and costs them a shit ton of money. Look at the Texas grid when it got cold. Look at the NYC blackouts in the 70s. Look at the 2008 housing crisis. All huge fuckups that would've been avoided if someone paid closer attention.

permalink save report block reply

▲ 3 ▼

– 650Thunderbolt 3 points 3 years ago +3 / -0

I don't think the 2008 housing crisis was a fuckup. It was a deliberate act.

permalink parent save report block reply

▲ 3 ▼

– NeedMoarPillows 3 points 3 years ago +3 / -0

If by deliberate you mean a bunch of banks trying to rip people off, then yes.

permalink parent save report block reply

▲ 2 ▼

– 650Thunderbolt 2 points 3 years ago +2 / -0

Yes.

permalink parent save report block reply

▲ 2 ▼

– JonathanE [S] 2 points 3 years ago +2 / -0

There's a big difference between a utility company with an IT system and a company that is focussed entirely on IT.

The power networks have been underfunded for years. A place like facebook has a dozen DC's and hugely bigger IT budgets.

permalink parent save report block reply

▲ 1 ▼

– NeedMoarPillows 1 point 3 years ago +1 / -0

Is that why Google never goes down, lol. They're all the same. It's all duct tape and winging it at these kind of places.

permalink parent save report block reply

▲ 1 ▼

– ATLAS_ONE 1 point 3 years ago +2 / -1

I get what you're saying here, but you are forgetting about affirmative action.

permalink save report block reply

▲ 1 ▼

– alternativetake 1 point 3 years ago +1 / -0

They used IOT to authenticate their building badges. The network engineer that put in the wrong statements took down their authentication. We all should be focused on the whistleblower who is trying to bring more rules and regulations to facebook and the internet as a whole. I believe the outage was the result of a diversity hire.

permalink save report block reply

▲ 0 ▼

– Q4theWIIN 0 points 3 years ago +1 / -1

Trumps Facebook was reinstated after the blackout and apparently any new post warnings are not showing up now :p

permalink save report block reply

▲ 3 ▼

– deleted 3 points 3 years ago +3 / -0