Major Microsoft and CloudStrike system collapse is a foretaste of more trouble ahead
The widespread collapse of many systems dependent on Microsoft and CloudStrike software is yet another example of the fragility of our Internet connected world. It is axiomatic that very complicated systems are fragile because their parts depend on one another and a failure in one part can bring the whole system down often in unexpected ways. Many mega computer connected systems are so complicated that no one knows for sure how internal subsystems are connected. Worse, day by day these systems are modified by many elements of the various organizations that use them, thus to "know" the system today doesn't mean you know it tomorrow. These systems evolved over time. To be crude, they grew up like Topsy. This is particularly true about systems run by our intelligence community. These are systems that I know about in some detail. I have not only used them, but I have a rarely granted Eng. IT degree and CISSP certification. These systems are vast, unmapped in any detailed way, and accrued functions and thus subroutines added by different teams for different purposes over time (read decades), For security reasons the right hand may not even know what the left hand is doing. It is absurd to think that a "back up" system could be built since no one knows that parameters of these mega systems and who would or could keep them up to date? Who would pay for building them in the first place? This is true for the IC and the business world in general. What to do? Start with the idea that massive connectivity is a bad thing. We probably need more standalone systems for particular purposes. The failure of one then would not bring down the whole. This has the added benefit of putting humans in the loop to move data from one system to another. Another related but radical idea is to depend more on people than computer systems. The highly respected State Department intelligence unit, INR, is an example of an organization less dependent on computers and more dependent on human observation and knowledge. This of course is not an overall solution; I don't think there is one. I suspect we will have to resign ourselves to more large scale failures. AI will only make the problem worse.