By Dale Skran
Three recent incidents have led to a lot of hand-wringing among space advocates. After the fully successful “hat trick” of landing both the side boosters and the center core the Falcon Heavy, the center core of the FH toppled due to heavy seas, with a substantial amount of damage to the core. Just recently, while testing the previously used Dragon 2 capsule for the upcoming in-flight abort test, it appears that an “anomaly” has resulted in substantial damage during a static test firing of the Draco engines. Another incident of relevance is that high winds caused the top of the SpaceX “Starhopper” test vehicle to fall over, resulting in the loss of the upper part of the Starhopper.
This has led to complaints along various lines:
• SpaceX is rushing “too fast” leading to all these issues.
• SpaceX should test more, plan more, simulate more, inspect more, etc. etc.
• In retrospect these failures, like most failures, seem obvious, so the engineers at SpaceX are clearly incompetent.
Sometimes the biggest SpaceX “fans” are the most disappointed to see their icon of the future soiled by engineering reality, although there is certainly substantial glee among the familiar SpaceX internet trolls. Others shrug their shoulders and repeat the time-worn mantra “space is hard.”
In addition to the above complaints, there has long been a “concern trolling” thread among SpaceX detractors that identify the use of a push hypergolic escape system as a source of great danger. These trolls focus on the supposed risks of putting hypergolics on a crewed vehicle, ignoring the long history of using hypergolics in the crewed and re-usable Space Shuttle Orbital Maneuvering System (OMS).
Most of this thinking springs from a substantial misunderstanding of the optimal number of failures during a test process. Setting aside wrong-headed test campaigns like early attempts to fire the first Shuttle engines “full up” with no prior component testing, if there are no failures in the test campaign one of several things is going on:
• The goals are very conservative in nature, so there is a very low risk of any failures.
• The testing is superficial, such that tests are not likely to fail.
• The testing is moving at a glacial pace, with so much preliminary testing and simulation that failures are rare.
A good test campaign is the opposite of the above:
• The goals include actual risk such as arises from doing new things.
• The tests are realistic and intensive.
• Testing moves at cost-optimized pace, not at a super-slow pace optimized to look good to outsiders.
Viewed from this perspective, SpaceX is doing just the right thing. No one has every done anything remotely like land the high energy FH center core far out to sea on a barge, except SpaceX, and they failed on their first attempt. Due to a desire to meet customer schedule needs, SpaceX had not completed work on a robot that attaches to the center core to prevent it from falling over, with the result that, it, well, fell over. I’ll go out on limb here and predict that in the future SpaceX will lose more center cores, possibly leading to the construction of a larger, more stable landing platform, better robots, or other changes. But these changes come from SpaceX pushing the outer envelope of what is possible and economic, not from a desire to “look good” by never failing any test.
Although the Dragon 2 “anomaly” is still fresh, we are talking about testing a first time ever used new design, and finding that something clearly broke. In typical NASA practice, such a problem would not arise because there is usually no attempt to re-use a capsule (not withstanding that once an uncrewed Gemini capsule was used twice, or that Orion/Starliner might be re-used in the future). SpaceX is again encountering test failures while pushing the envelope of what is economically possible. In this case SpaceX is trying to lower costs by re-using the launch escape system, which has never been done before. Although the first flight of Dragon 2 appears to have been very successful, it would be quite unusual for no problems to be found as result of a major, realistic test like the one conducted.
With regard to the damaged “Starhopper” SpaceX has continued the test sequence successfully without the damaged top section, which appears to have only a limited impact on the test goals. Again, this is another example of SpaceX focusing on cost-optimization over appearances.
With a real failure to chew on, I am concerned that some nervous Nellies within NASA will now seek a long delay in the Commercial Crew program. Suppose NASA focused more on being cost-effective while running a test campaign—how might this issue be dealt with? Here are some suggestions:
• Let SpaceX figure out what happened, fix it, and perform the in-flight abort test at whatever pace SpaceX wants to perform it without a lot of paperwork and reviews.
• Once the capsule is returned, perform a static fire test.
• Contract SpaceX to deliver a load of cargo to the ISS on a Dragon 2, in effect paying SpaceX for a second test flight to validate the fixes. If SpaceX fails to deliver the cargo, no payment is made.
• Again, once the capsule is returned, perform a static fire test.
• Then move on to a crewed test flight with high confidence in the system.
This kind of process would involve realistic, demanding testing, while at the same time being cost-optimized. NASA has made a fundamental error in not flying cargo Dragon 2s before the crewed version. Now seems like a good time to correct this mistake. By failing to fly cargo Dragon 2s first, NASA passed on a golden opportunity to maximize testing at minimal cost and risk. SpaceX has stated in the past that making crewed Dragon 2 work was their top corporate priority, and the recent anomaly puts an exclamation point on that goal. NASA should help by avoiding embroiling SpaceX in endless safety meetings, new design reviews, and other activities that increase costs while slowing things down.
As an engineer that balances risk & reward in the automotive industry I can appreciate the necessity of rapid engineering but at same time & in my mind if I was involved in the development of Dragon 2 I would have cherished the opportunity to have been able to disassembled DM1 – Dragon 2 and gone over every aspect as they did with the first F9 Block 5.
There was just to much violence in the reentry and decent as well as the vehicle being immersed in saltwater to have passed up all there was to have been learned!
Now there is nothing left but dust and what could have presented itself as potentially an easy fix is now something that could drag out to more than a year instead of months! I’m a Huge SpaceX Fan but in my mind this was a no brainer and something that was definitely Rushed!
Wayne – Thanks for the thoughtful post. You are of course correct that one of the great advantages of reuse is the opportunity to examine the vehicle for problems. Clearly SpaceX was prioritizing a desire to move forward with testing over completely breaking down the capsule and re-building it. This is an issue on which good engineers are going to reach different conclusions, but with SpaceX’s Dragon re-use experience base they apparently felt that six weeks of checking things over was enough. Clearly they missed something, but moving forward with the test may still have been the optimal choice. There are at least a couple of reasons for this. First, taking apart and re-assembling the capsule might “accidentally” fix some real issues without anyone becoming aware of exactly what they are. Second, you could take apart and put the capsule back together perfectly, and fail due to a design flaw on the next test. One bad case is a “self-solving” problem that fixes itself while you disassemble the capsule and check all the parts. An example might be water in a fuel line that evaporates during the checking process such that it is not detected during disassembly, but still causes a failure on the next flight.
My base engineering experience is with high-reliability telecom gear at AT&T Bell Labs. Systems I worked on have been used on Air Force One and nuclear submarines. Thanks for commenting on my blog post.
Thanks for a rational and reasonable analysis of these recent incidents. We have become so risk averse as a society that we forget that failure is a part of success. I am a software designer, and we test every step of the way. Our first pass of a software project is full of bugs, but we test early to find them and fix them. I know that this is easier with software than it is with hardware, but I do agree with your position that for a test to be meaningful this often means there is substantial risk at stake. It is unreasonable to look at a failure and draw the conclusion that this automatically means a failure in the process, design, or management. The real test is what changes next: are improvements made? Is there a track record of responding to and fixing issues? Does the same mistake occur over and over or are there different, disparate failures that are unrelated? From what I can seem SpaceX has successfully dealt with previous failures and seems to move forward, learning from these failures and improving.
Detractors detract. That’s what they do. At first they said returning and landing a booster was impossible. Now they say, “Ya, it landed, but it fell over!” When the next booster land and is returned safely, they will say, “Ya, they got it back but “.
Go and watch some videos of the early Redstone and Vanguard projects.
When it comes to space exploration, SAFETY IS NOT AN OPTION! 50 years ago we landed on the Moon by making bold, risky moves, not by fixation on safety! Space flights ARE NOT SAFE by default! And astronauts are well aware of that and willingly accept the risk=
I think something worth noting is that these are, mostly, three separate programs. 1. The Falcon Heavy is sort of a second generation evolution of the F1/F9/FH progression and, woops, we dropped a booster in meeting a timeline. 2. The Crew Dragon failure has nothing to do with the FH center core falling over and is another example of a later generation being exposed to new missions/tests. 3. The StarHopper is like starting all over again with Grasshopper and F1. Did we forget about all those epic explosions? My money says at least one StarHopper explodes (sorry Elon)!
At some point work must proceed. Grumman dithered on whether the Apollo lunar module should have four or five legs. NASA finally told them to go with four because the chances of two legs buckling on the Moon were minimal.
The same two hypergolic propellants were also used on both stages of the Apollo Lunar Lander. They worked perfectly 12 times! When the Lunar Ascent Stage fired, the two standing astronauts were only a foot or so away from the burning rocket engine.