Welcome to Disaster.Stream Bringing hard-won Lessons-Learned from Disaster Recovery Responders
Oct. 28, 2022

S1E1 Stock Market DDOS Attack

S1E1 Stock Market DDOS Attack

US Stock Market DDOS Distributed Denial of Service Attack
Responder Stories: Bill Alderson, Duke Tunstall 
Consequential Organizations and People Introductions
Vinton Cerf, Father of the Internet VP, Google
Charlie Lewis: Expert Associate Partner, Mc...

US Stock Market DDOS Distributed Denial of Service Attack

Responder Stories: Bill Alderson, Duke Tunstall 

Consequential Organizations and People Introductions

Vinton Cerf, Father of the Internet VP, Google

Charlie Lewis: Expert Associate Partner, McKinsey & Company

Jeffrey Caso, Associate Partner and Cyber Expert, McKinsey & Company

Paul Barrett, Enterprise CTO, NetScout

Debbie Gordon, Founder & CEO, Cloud Range Cyber

Bryan Lares, VP of Product, ExtraHop 

www.disaster.stream 

Transcript

S1E1 Disaster Stream Stock Market DDOS Attack

[00:00:00] Hello, and thank you for joining me for episode one of season one of this new podcast, Disaster Stream. We're going to cover the. US Stock market denial of service. Now, that's of interest to most people regardless of whether you're technical or not technical, because when that [00:00:30] happens, it affects every part of an organization.

And this particular problem affected not just one organization, but many organizations because the denial of service stops services of major stock market exchange. A big problem. And it lasted for a long time. Woody Allen said 90% of anything is being there.

In the times of Covid we couldn't [00:01:00] always be there, but we could be there virtually. Another famous guy, Chuck Swindall, said 90% of anything is attitude. Attitude should be, can do, yes, We can solve the problem. And here we go.

So let's take a look at what we have for you today. In each episode, I will introduce you to people who speak at my conferences, who are in my round tables . So today I'm [00:01:30] going to kinda load up a little heavy cuz it's our first episode and I want you to get a feeling for the type of organizations and consequential people that we will call upon for various types of help and advice. So up first will be McKinsey and Company, followed by NetScout Extra Hop and Cloud Range, which is a brand new capability to run actual simulation training [00:02:00] exercises for disaster recovery, incident recovery. Toward the end of the broadcast, we're gonna have Venton Cerf, who is the father of the internet recipient of the Presidential Medal of Honor.

He's the guy who pretty much put together TCP/IP. Promulgated with all the programmers and he talked at my recent conference about TCP/IP being 50 years old and some of the consequences of that. So you're not gonna wanna miss these little vignettes.

One or two minutes. Doesn't take long as you [00:02:30] get introduced to some of these great organizations and people, a little bit more about the broadcast part of this is not just for me to tell my stories, but I want to tell your story. I wanna give your team and yourself some recognition for the great work that you're doing out there. So if you'll recommend security incident, disaster recovery responders, anytime where data is being threatened or impacted, [00:03:00] we wanna hear about that. You can send that email to bill@disaster.stream. You can listen on Apple, Spotify, Google Tune in Amazon, Pod Beam and more.

And we have this content available to you not just in the audio format, but also in video. And we're going across many different distributions we welcome new guests and industry participation, and I'll [00:03:30] run out of my stories after about 20 or 25 big issues that I've responded to and definitely I wanna start integrating in the things that you guys have done out there. To save the day. I call it pulling the baby out of the lion's mouth. Pretty fun job to be able to do this my entire 40 year career. And now I'm trying to help the whole industry understand what it means to be a disaster [00:04:00] responder and tell your story and give you a good shot in the arm. All right. Now this is a little bit about me. It's my infographic bio. I've responded to a lot of various things and I've been out I wrote a really nice paper about 50 pages on the Solar Winds breach that people still say is the best one out there.

Happy to get that to you if you're interested. Lots of conferences. Net World Interop had hundreds and thousands of people at my conferences.[00:04:30] Wrote a column in Network Computing magazine. Some of you may remember me from that certified 3,500 network forensic professionals. And involved as a board member in ISSA.

 I'm considered a Vietnam era veteran. Worked at Lockheed Built Secure. Networks with crypto gear back in 1980, and I had to start looking at data scopes and packets in 1980 to figure out how all of that stuff worked.

Pretty cool. Then I worked for [00:05:00] the creator of the sniffer at the startup of network general corporation, which is. Known as NetScout through acquisitions that's a little bit about me, so you have an idea of what I do and look forward to getting to know you a little bit better over time.

I wrote a white paper recently in preparation for this podcast as I, I build case studies out of each one of my high stakes, high [00:05:30] visibility, lesson learned type environments. I'm always wanting to pull out. The lessons learned, and I'll talk to you more and more about that because it's much better to learn from somebody else's lesson learned than to have to learn the lesson yourself.

How do we deal with disaster? What are the phases of disaster? I'll talk about that over time. Journaling makes sure you remember what happened incrementally so that you can then pull those lessons learned out and. , [00:06:00] like I said, best practices and it's all about tiger teams. I've been privileged to come in and lead various tiger teams at the Pentagon, 9/11 recovery, where, we had to come in and diagnose very critical problems.

Do triage find big network diagrams. Packet flow diagrams, application flow diagrams, the metrics and then troubleshoot each one of those things. Troubleshooting is like peeling and onion, and there's the [00:06:30] diagram there, and I'll talk to you more about that in the future.

Just want to do a setting for you where, we, we talk about. Problem analysis, disaster recovery, and responding to these problems. You record these things, you gather the lessons learned, and then you build out best practices so that. , you can have crisis avoidance in your organization or disaster avoidance.

It's the ultimate in credibility not to have a problem on your [00:07:00] watch. We'll try and help you learn those things that were hard won. People at the Pentagon died in order for these lessons learned to be brought forward. So we should respect them and use them. We're not repeating the same problems that we found at 9/11 when we went to recover their communications systems.

The fingerprint of every organization is as unique as the individual's fingerprint, whether you started it at, with a distributed [00:07:30] architecture or a centralized architecture, centralized as the bank, distributed as retail . Your network has a fingerprint. You have 50 or a hundred different vendors, and every one of those mixes are different.

So every organization has a unique fingerprint of their mission critical enterprise, and we talk about that. How to deal with that. That means that every enterprise has to be completely managed, very focused [00:08:00] and quite different between enterprises. One size doesn't fit all. So you have to really customize your response, your tools, your systems, your planning, to meet your particular fingerprint.

 When we talk about best practices. These are best practices that have been tried and true, refined, and if you put them to work, if you impute them into your organization, you'll have intrinsic data [00:08:30] recovery. You will have intrinsic disaster recovery for the most part sometimes so that you don't have to have the disaster.

That's the great thing. You can obviate disaster many times by impeding and applying best practices. if you have a large organization, you might need somebody like McKinsey and Company or. Deloitte, Booz Allen, GDIT to help you implement those systems. But we are here to help you identify those, [00:09:00] focus on those so that you can build them into best practices so you're not repeating the same problems and you're putting forth the best way forward for your organization to respond to disaster.

With that, we're gonna talk a little with our friends at McKinsey, who spoke at one of my round tables recently. And here you go. We'll be back in just a minute. Here's an introduction to McKinsey and Company.

They're gonna talk. The [00:09:30] Passwordless and some questions from some people at the round table back in a minute.

[00:10:00] [00:10:30] [00:11:00] [00:11:30]

Okay, we're back now the [00:12:00] DDOS attack at the US Stock markets. Let's go through it one by one and just take a look and see what we've got here. First of all, the Cyber attack ties up the US stock markets. It affected Wall Street, the brokers, the dealers, the customer. All sorts of implications when something of this nature goes down.

And it wasn't completely down, and that's sometimes when it's a bit intermittent because the [00:12:30] denial of service kept hitting it. And it would it would go on and off a little here, a little there. And some of the time you could get in. Most of the time you couldn't. And that's the nature of a denial of service attack, is that it denies the legitimate traffic, the legitimate services that the organization or the networks and systems are putting together.

This is what it looked like. You got hackers out there and they are sending in denial of service, [00:13:00] SYN-ATTACKs to try and break your system through brute force hitting them, asking for a connection . So here it is. It's a good picture, isn't it?

There's the Wall Street Bull all tied up just like a bull in the arena there. Trying to. Get away. And here the denial of service attack has the stock markets tied up. I want you to hear real quick from Paul Barrett, he's the CTO over at [00:13:30] NetScout. And NetScout has these great tools that can be distributed around the world in order to capture packets so that you can bring them back and diagnose problems remotely, virtually, and anywhere around the world.

So let's hear from Paul and then we'll be back.

Okay, we're back now. Firewalls were melting down because of the DDOS SYN-ACK load. The [00:14:00] firewalls were getting so many requests that they could not, You couldn't log into the things for number one, because they were so busy responding to requests. Every once in a while a request would get. , but for the most part it was denying the service.

The legitimate users had to get in to look at quotes, to look at buy and sell orders and that sort of thing. So it was a pretty big problem. Now, the firewalls had a lot of rules [00:14:30] on them and they were highly granular rules because there's good. To have quite granular, very effective firewall rules, but because there's a lot of 'em, when these bulk attacks started hitting, it really broke down the system with this Global incoming attack.

And it was highly impactful. So you had all these people around the world coming through and hitting and breaking your firewall so that it could not take care of legitimate [00:15:00] requests from the market. Now that affected broker dealers, like I said, customers and the public, and it was not a good thing.

Now, I was on the West coast at the time this started, and they called me up and talked to me a little bit, and then they said, Bill, police come in. We can't figure it out. It's been several days. We've got law enforcement, we've got every vendor that we have [00:15:30] in our portfolio, they're all here. They're all supporting, but we can't figure out how to stop this thing.

I popped on an airplane, went back to the East coast to jump in and analyze this problem. What I found was very interestingly, that it was indeed a SYN attack and the requests were coming in. Here's the thing I know tcp and consequently, I know that when. Send a request to connect up to a server [00:16:00] or a system or an application.

You send a send request. It's a synchronized, and so you're trying to get a connection with this system so that you can then use the communications path in a reliable manner. And the first thing that it does is it comes up with this random number, the random number. Is one to 4 billion. And it essentially is a sequence number, a starting sequence number.

The purpose for this [00:16:30] is partially security. Security by obscurity. If every time you created a session it started at zero, then 100 and 500, then somebody could very easily sleep slip. And take over your session cuz they could anticipate what was going to go on. And so we use random sequence numbers to begin a session.

That random sequence number was not random. In this particular case, they used the same sequence number over and over again. [00:17:00] Now the organizations that I was working with were really. They had great coders and they went out and investigated and they downloaded all of the source code of various types of these script kitties that would generate these type of attacks.

And in the process I said, Hey, we're use, they're using the same sequence number over and over again, which is an indicator that they're not that smart. So consequently, they found the [00:17:30] actual source code of. Software that the hackers were using to generate this denial of service attack. Very cool. And as a result, we took that tool in and we could see the various behaviors and that sort of thing.

One of the things that it did was it did source ip, random source IP addresses. So we couldn't tell who it was coming from. It was basically indicating that if you were on the internet, through all the randomization of the source addresses, [00:18:00] that it could have been anyone, and in, in fact, everyone was getting accused of being the source of this particular problem because the IP addresses now a.

We don't use that kind of routing anymore and we've fixed that problem for the most part on the internet now because we use reverse path forwarding algorithms, which means that you can't just put an IP address on a packet and send it. Because the BGP routers on the [00:18:30] internet will not forward a packet that is not appropriately from the network that you were on.

So if it won't forward a packet to that, it will not allow you to send a packet from it. So you cannot use spurious IP addresses in many cases. Now, inside an organization you can. People can do that if the IP addresses that they're randomizing are your internal addresses. So a university or a [00:19:00] large company or organization has a very large ip IP address range.

They could successfully limit it to that IP address range due denial of service attacks. Because the router that supports you would then allow that to be sent out randomized, because the reverse path forward would know that it was appropriately from that IP address range. Okay. So the the issue [00:19:30] though was the, Firewalls that were being used were highly granular and they could not filter on a single sequence TCP sequence number.

And even if they could, every time one of those requests comes in, it interrupts a cpu. And causes a whole bunch of consternation. So even if you could filter out that one sequence number wouldn't make much of a difference because it would still interrupt the CPU and consume [00:20:00] bandwidth and traffic and that sort of thing and processes so that it would still have the same effect.

So we were scratching our heads and trying to come up. Better way of resolving this problem. And of course we did. And we're, I'm gonna talk to you a little bit about. Now I want to talk generally about a disastrous problem and what it takes to resolve a disastrous problem. First of all you're probably familiar with you know this thing [00:20:30] called a square problem square.

So you've got a team, you've got an environment, you've got a problem, you have symptoms and that is what we know. We have, those are the symptoms. Those are the problems. And the status quo is that today without new information, we cannot solve that problem. And if you're familiar with Steven Covey, Seven Habits of Highly Effective People, he talks about paradigm shift in there.

He was probably the one that that really brought about [00:21:00] the term in to, to have ubiquitous. In the world be the paradigm shift because he talked about it, told stories about it, and it's really great. I may tell one of one of his stories sometime so that it helps you understand this, but essentially a paradigm shift.

Occurs when you have new information and that new information has a payoff because you can solve a problem that you could not [00:21:30] solve yesterday because you didn't have new information. You had all the symptoms, you knew all the, all that, but there was a key piece of information you did not have. So it's.

Necessary to get that new information and to find it and pursue new information, new findings, new visibility, new knowledge, new best practices, root cause analysis to discover new things about the problem that you didn't otherwise know, just like we went through [00:22:00] at the stock market to, to resolve this.

So the new input, it's it changes it from a. To a cube, a square has four sides. A cube has six sides. So the two new I items is the new information and which is new input. And then you get a payoff from that. So new input and you get a pay. [00:22:30] Because you found some new information and now you were going to be able to solve yesterday's problems because you have information that allows you to solve it.

Now, the sad thing is that every time I go in and solve a problem, it never fails. Everybody says that was sure. It was simple. After you got the answer, it was not simple before you got the answer and it's Oh, I should've known that type of thing. No, it's sometimes very hard.

One hard to [00:23:00] find and you have to do a lot of work to find that, but that gives you the payoff. All right, now. The concept that we came up with was a multi-tier bulk access firewall. So instead of just having one set of firewalls where everything came into, we were going to have two sets of firewalls. The first set was to stop the bulk attack.

The second set was to [00:23:30] process the granular rules. And so you're gonna see a little bit how we go about doing. Yeah, and before we go through and talk about that, I want to introduce you to Debbie Gordon of Cloud Range. Now, Cloud Range has a simulation system to take people who are really smart people, but put them into an environment where they can collaborate and solve problems together as a.

Now, one of the things that I will mention in the [00:24:00] future is that, when I arrived at the Pentagon, there were people who were missing because they were killed by the aircraft hitting the building. So they were down a lot of personnel that they normally had. And this team, parts of it had exercises for disaster recovery.

They're a military organization. But think about it. If your organization got hit, would your team be able to deal with the fact that maybe some people were [00:24:30] affected by a natural disaster in their area and they had to take care of their families, not the company or the organization. So you might be down several people.

Exercising with those people and simulating disaster recovery is very powerful. And in this instance, she's talking about a Cyber attack, but it can be collaborative training for any of those things to bring your team together so you can educate and get lots of training for [00:25:00] the individual.

But if the individual doesn't know how to collaborate communi. And use tools and banter back and forth to solve a problem as a group. Yeah, that's what this does. So take a listen to Debbie for a minute.

[00:25:30] [00:26:00]

We're back. The modified firewall architecture. Take a look, [00:26:30] you've. A bulk attack firewall, and then you've got the granular firewalls and the secondary granular firewall. So by putting the filter in for that sequence number and stopping that particular sequence number from going through it, put the burden on the first firewall, the bulk attack firewall, eliminating the burden from the granular firewalls that were second.[00:27:00]

So imagine a primary and a secondary set of firewalls. The first one was to take the bulk attacks. The second one is all your normal firewalls with the highly granular rules so that the bulk attack firewall, all it had to do is take out the bulk attack, leaving only the good traffic to continue through, not melting down the firewalls, not melting down, the networks, not melting.

The servers involved in that situation. Now, [00:27:30] this happened to be with Cisco, and Cisco volunteered a new bulk firewall that they brought with them in case they needed it for an emergency. But the problem is that it wasn't that simple because remember how I told you there was no. Filtering capability out of the user interface to kill one sequence number.

They said, We didn't think anybody would ever wanna do that. And I said don't you have a pattern match offset that you can set? At the time they didn't. But [00:28:00] he said because Humpty Dumpty, all the kings horses and all the kings men were there. We had some priority access to things, if and so we got into the actual development engineer. At the Cisco Firewall Group and they said, Hey, Bill, we'll write a hack version of the code that will filter out forever. That one sequence number. Now, being that one sequence number was only one in 4 billion, if somebody used that legitimately, it would be denied.

[00:28:30] But it's no big deal. Three seconds later, it would retry with a different random sequence number and it would work. So not a big deal to lose one out of 4 billion initial sequence numbers to solve this particular problem. And that's exactly what we did. We put that bulk firewall in. It had the hack version of the code that filtered out that one sequence number.

The firewall held up and was able to filter out and [00:29:00] block that one, and then the good traffic was able to continue through to the more granular firewalls. So it worked great. Now, the packet analyzer helped us identify the sequence number that was being used. The knowledge of theory and understanding of protocols allowed us to understand and actually seeing the packets of the particular problem.

Now, this is a zero day type problem, and if you. Are a large [00:29:30] enterprise and you don't have the ability to do packet analysis, I'm sorry, but you're not gonna be able to respond very effectively to zero day problems. So somebody has got to look at it from this highly granular view in order to help you find that solution to that problem.

So this is the way it looks. If you can imagine you've got domino. Those dominoes are moving forward to the bulk firewall and boom, they hit the bulk [00:30:00] firewall. And that domino, that set of dominoes effect stops at that first firewall. And then there's another set of dominoes, which is a secondary firewalls, and those are not impacted, so the dominoes couldn't get through to.

F to cause the continued crescendo into the firewalls and the servers and applications on the other side. So the solution worked really great. Everyone was happy and they promulgated that [00:30:30] change through law enforcement and other people to the other exchanges, and they were able to do that.

What we're talking about here is business continuity, right? We're planning this. We're getting lessons learned. We're figuring out how to do recovery. We have all sorts of metrics about recovery. How long is it gonna take to get partial recovery or full recovery? How are we gonna build our systems to be more sistant?

What kind of procedures and best [00:31:00] practices do we need to put into place? So yes, that's what this podcast is gonna be about, and we'll talk about that with various people in the future and that sort of thing. Now want to introduce you. This is a very cool thing that Brian Lairs of Extra Hop talks about is defending the win.

With network intelligence, it's basically the mid game from the time that your enterprise gets hit until and that's the initial hit [00:31:30] until you find it. So that's the dwell time and that dwell time. If you somebody does get in, you wanna have the systems and capability to act on that very rapidly.

So Brian's gonna talk a little bit about.

Okay, we're back Now. Cybersecurity, of course covers a lot of different things, applications, networks encryption the fact that we have. The fear of [00:32:00] encryption being broken and that sort of thing here in the future we're con concerned about that various end user education. I'm not a big one.

It is important to educate, certainly your employees, but to put the burden fully on an employee because your Cybersecurity systems are not resilient and are. Finding and stopping things with technology. So there's a balance. You [00:32:30] can't and there's a move afoot to fire employees who click on ransomware and that sort of thing.

Frankly, I think that it's the technology group's problem ultimately, and I certainly wouldn't advocate firing employees who are great otherwise for what you hired them for because they are not Cyber security personnel. So we just need to build better Cyber security to meet those needs. Think you'll agree with me.

So those zero [00:33:00] day problems require immediate new solutions You. Knowledgeable technologists who are trained, skilled, good communicators, good writers, so that we can talk about and communicate these things. Because zero day problems require an immediate solution that's brand new, that's never been used before.

That's very important. So the best practice amplification, I'm gonna talk about [00:33:30] that cuz that's what this is about. What did we. What did we learn about the stock exchange? They were down for several days before I got there because they didn't have their own. Network forensic people at the time.

After this, I came in and helped train them in some network forensics so that they had the skills so that they wouldn't be down for multiple days. They could identify some of these zero day problems. Anyway that's the purpose. So [00:34:00] the organization takes a best practice. That's very powerful proven.

And then we let the organization amplify it, and that's why I talked about. If you want to put best practices that the lessons learned that we talk about on this program you might need someone like McKenzie Bain or Boston Consulting Group, Deloitte G D I T or Cap Gemini to help you implement those far and.[00:34:30]

Okay, so that's just why what I'm talking about now, smaller organizations, you can probably do that with your own team might need some help, but in order to make any changes, you're gonna need some help. That's why those big accounting firms or consulting firms are out there. Do you know that there are hundreds of thousands of those very smart people working for those organizations who basically take a problem and implement a solution very [00:35:00] rapidly?

It's not inexpensive, but you get what you're paid for. Okay the first problem that I talked about from my white paper my case studies is the US stock market denial of service attack. And so we'll be talking about some of my 25 or 30 major lessons learned. I like to tell people, Hey, you know what lessons learned.

Would have prevented [00:35:30] Facebook from being down for a day last October 4th. Facebook went down for a day, and I'll talk about this sometime, but what lesson learned? Had they implemented it and imputed it into their environment, would've prevented them from being down for a day. I have the answer to that, and I'm more than happy to share it at some future time, but this is why we're doing this broadcast in order to help you and [00:36:00] organizations implement lessons learned that could save.

Oh, let's see. I think Facebook lost about 5% of its value, 25 to 50 billion with a B dollars in one four hour period that they were down. So it's gonna pay dividends to implement the very best of best practices. So I encourage you to come back and and be with us on these broadcasts.[00:36:30]

I have a message from Venton Cerf. Vent is the father of the internet. He spoke at my conference on TCP/IP and security a few, a couple of months ago, and I want to take his message and introduce you to vent if you don't already know him as the father of the internet and the recipient of the Presidential Metal of Freedom. A great guy. He's a VP at Google and you won't [00:37:00] forget hearing a little bit from vent and so here is vent. We'll be back after he talks to us for a moment.

Okay, we're back. Thank you so much for joining us. Recommend your folks anyone with a disaster recovery security incident , any type of data, disaster recovery efforts. Send me an email bill@disaster.stream and then encourage people to learn from these [00:40:00] best practices.

They can save. Potentially someday I'm gonna talk to you about that Facebook solution that would've saved 25 to $50B billion for Facebook had they implemented the best practice . And and maybe we'll have some folks on who, who did implement. Capability and can testify that it's highly valuable.

I want to talk to you just a little bit about what we're gonna talk about in one of our [00:40:30] next sessions. We'll talk about the Pentagon 9/11 where I flew in with five of my team and we brought the Pentagon communicating.

So we're gonna talk about that in the future episode and we're gonna have none other than David Wills who used to be the he was at STRATCOM JOINT CHIEFS of staff at the Pentagon and US CENTCOM. He was chief of network engineering to run communications for the [00:41:00] Iraq and Afghanistan wars.

Thank you so much for joining me. Do, let me help you give your folks a shot in the arm. I'm always available to do a lunch and learn to help your folks. I'd love to tell your story on our podcast, and if you're a vendor who is working on solutions that are meaningful, new and capable, like cloud range and some of these others, I'd be very happy to introduce [00:41:30] the audience your new technology and new disaster recovery capabilities. Thank you so much for being here today. We really appreciate your support.