The 10,000 Hour Flight Test Programme

Why amateur-built fleet testing does not scale to serial production

May 31, 2026

A recurring argument in aviation forums and comment sections runs as follows [EGA25, Mar25]. The new MOSAIC rule, which the FAA issued in July 2025 and which takes effect on 24 July 2026, lets kit manufacturers sell factory-built aircraft against an industry consensus standard rather than a full CS-23 type certification. Since the substantive innovation in light aircraft has for decades happened in the experimental fleet, where classical certification is too slow and too expensive to reach, the reasoning runs that if a homebuilt Lancair or RV flies acceptably, the certification apparatus around a factory-built equivalent, the design organisation approval and the production organisation approval, is overhead the experimental world has already shown to be dispensable. Europe, on this view, should follow with an analogous rule under Part-21 Light or watch its manufacturers migrate to the US market.

The premise the argument rests on, and which the people advancing it rarely examine, is that an experimental aircraft reaches service on light testing. It reaches service through more scrutiny than a certificated aircraft of the same family, not less, because every individual airframe completes twenty-five to forty hours of Phase 1 flight testing under an FAA-issued airworthiness certificate and FAA-issued operating limitations, more per airframe than the type test programme allocates to any single example once the type is cleared. The experimental regime and type certification both place the work under the authority, the first at every airframe and the second once for the type, whereas MOSAIC rests on a manufacturer declaration against a consensus standard and does neither, so the safety record of the experimental fleet cannot be lent to it as the argument supposes. That per-airframe testing is also what stops the experimental regime scaling, since a manufacturer who carried twenty-five to forty hours at every delivered aircraft would overrun the cost of the organisations that type certification maintains.

One assumption underlies the whole examination, namely that every manufacturer involved honours the declarations they sign, since malicious acting is a different question. What is at issue is what each regime actually demonstrates when the declaration is truthful.

Twenty-Five Hours of Truth

The substantive requirement for any aircraft holding an experimental airworthiness certificate appears in 14 CFR 91.319(b). Before the aircraft may leave its assigned flight test area, the operator must show that the aircraft is controllable throughout its normal range of speeds and throughout all manoeuvres to be executed, and that it has no hazardous operating characteristics or design features [FAA91319]. The FAA accepts compliance through a phased flight test approach in which Phase 1 establishes the envelope.

The acceptable means of compliance appears in Advisory Circular 90-89C, which the FAA published in February 2023 [FAA23]. The standard programme requires twenty-five hours of flight time when a type-certificated engine and propeller combination is fitted, forty hours otherwise. The C revision also permits a task-based alternative with seventeen prescribed flight test tasks. Chapter 6 covers stalls, stability and control, flutter, spins where applicable, and accelerated stalls. Chapter 7 adds maximum gross weight and service ceiling tests, and Chapter 8 contains additional considerations for canard configurations.

Twenty-five to forty hours of disciplined flight testing per aircraft, with stalls, accelerated stalls, flutter envelope expansion, and stability sweeps, is a considerable undertaking, and it is more per airframe than the type test programme allocates to any single example once the type is cleared. The people who treat the experimental regime as evidence that oversight is dispensable rarely register this, since the experimental aircraft they point to reached service through more individual scrutiny, not less.

One Test for the Type, Not One Test per Aircraft

The relevant comparison is not Phase 1 versus a single TC flight test programme. It is Phase 1 multiplied across the fleet versus CS-23 certification once for the type plus production oversight per aircraft. The structure of these two regimes differs at a level deeper than the test content.

CS-23 Amendment 5, which EASA published in 2017, replaced roughly four hundred prescriptive design requirements with about sixty-seven performance-based requirements, and admits compliance through consensus standards developed by ASTM Committee F44 and others [EASA17]. The flight test work appears in paragraphs such as CS 23.2150, which sets out stall characteristics, stall warning, and spin requirements, and CS 23.2135, which addresses controllability, alongside the structural substantiation requirements in Subpart C. The certification basis is established once for the type, and includes static load tests, ground vibration tests, fatigue substantiation, complete performance and handling qualities envelopes, and flutter clearance to V_D.

What does not appear in the per-type cost but appears in the per-aircraft cost is the apparatus of design and production approval. The design organisation maintains the substantiation throughout the aircraft’s operational life. The production organisation operates a quality system that the competent national authority audits. Each individual aircraft undergoes first-article inspection, conformity-of-production verification through statistical sampling, and continuing airworthiness monitoring.

The economic consequence follows directly. CS-23 front-loads a large one-time substantiation cost, then adds a small per-aircraft cost for production oversight. EAB avoids the one-time cost but repeats the full substantiation burden at every aircraft through Phase 1. Across the fleet, the total test volume in the EAB regime dwarfs the type certification programme. Five hundred RV-7s at twenty-five hours each accumulate 12,500 hours of flight testing, whereas the equivalent CS-23 campaign for the same airframe might require two hundred to three hundred hours. The EAB fleet is tested thirty to sixty times more than the TC fleet, measured in aggregate flight test hours per type.

This volume comes at a price that does not appear on any invoice. The builder-owner’s labour in Phase 1 is uncompensated. If a manufacturer had to replicate the same per-aircraft test programme with salaried test pilots and flight test engineers, the cost would be prohibitive. EAB testing is economically viable only because the tester works for free, accepts the risk personally, and does not bill for the months of preparation that a professional programme requires. The moment a manufacturer tries to scale this model to serial production, the hidden cost surfaces and the arithmetic breaks.

Beyond a certain production volume, CS-23 therefore becomes cheaper per aircraft than EAB even at face value. The crossover arrives sooner than most forum discussions assume, because the comparison that matters is not the regulatory overhead but the fully loaded cost of substantiation per delivered aircraft.

Four Ways to Be Different

Hours of flight time are not units of safety information. Experimental-amateur-built aircraft show dispersion at four distinct layers, only the first of which the forum argument addresses.

The first layer is build quality, which the regulatory regime acknowledges explicitly. AC 90-89C exists in part because the FAA assumes that aircraft built by amateurs may contain workmanship variations that need detection before routine operation [FAA23]. The German equivalent, the LBA’s Merkblatt 240.1 for individual aircraft inspection under § 3 LuftGerPV, addresses the same concern through inspection of single specimens [LBA21]. Build quality dispersion is taken as given and addressed at the testing stage.

The second layer is the resulting flight quality. The Long-EZ provides the clearest example. The original GU25-5(11)8 canard airfoil exhibited a pitch-down trim change when flown into rain, a phenomenon that Rutan’s Canard Pusher newsletters documented in the late 1970s [KPl20]. In 1985, plans were issued for a redesigned canard with the John Roncz R1145MS airfoil, which largely eliminated the rain trim change. What was noted at the time, and what appears in current buying guides for the type, is that some early canards built to the GU25-5(11)8 plans showed no rain trim change at all. The reason was not that those builders had solved the problem. They had built canards with poor surface contour, so that no laminar flow developed on the upper surface, so that there was nothing for the rain to disturb. Good workmanship produced the documented handling deficiency, whereas poor workmanship produced an aircraft that appeared to handle correctly because the underlying aerodynamic mechanism was not present at all.

This inverts the intuition that careful builders produce safer aircraft. A Phase 1 pilot in a poorly built original Long-EZ records that everything appeared in order, delivers the aircraft into Phase 2 with a structural defect that Phase 1 failed to detect, because Phase 1 looked for an anomaly that the poor build inadvertently suppressed. Fibre-reinforced plastic moldless construction is particularly vulnerable to this kind of inversion because contour fidelity below one per cent of chord lies outside what amateur construction can reliably achieve.

The RV-7 illustrates the same dispersion in a less exotic form. Vans Aircraft uses CNC-punched aluminium construction with match-drilled holes, which is about as industrial as kit construction gets. Yet documented examples on the Vans Air Force forum show best-glide speeds ranging from eighty to ninety-five knots indicated between aircraft of the same model. The cause is static source placement on the fuselage skin, which interacts with the local pressure field and produces position errors that vary between aircraft. A pilot whose Phase 1 data shows best glide at eighty knots indicated, when the true calibrated airspeed at that reading is ninety-five knots, is gliding fifteen knots faster than optimum and losing range in an engine failure.

The third layer is the resulting performance. VariEze plans-built aircraft showed empty weight dispersion of approximately thirty per cent across the fleet, with Rutan’s surveys in 1978 finding average weights thirty to fifty pounds over plan specification [NASM]. GlaStar kit-built aircraft show empty weights ranging from 1150 to one thousand four hundred pounds, a twenty-two per cent spread. Stall speeds, climb rates, service ceilings, and ranges all move with empty weight, so the certificated performance figures in the kit manufacturer’s documentation become approximate references rather than verified data for any given aircraft.

The fourth layer is the one that the forum argument fails to consider, and it is the layer that dominates the others.

The Tester Who Built the Evidence

Until October 2014, the operating limitations issued with an experimental amateur-built airworthiness certificate carried a passage that the FAA interpreted strictly. No person could be carried during the flight test phase unless that person was essential to the purpose of the flight, and the agency’s reading was that for most aircraft this meant the pilot alone. The official position, as test pilot Paul Cornick described it in Kitplanes [Cor19], was that if the builder cannot perform the required test procedures in a relatively simple aircraft, then they are not qualified to be the pilot doing the flying. Phase 1 is therefore flown by an unqualified pilot, alone, in an aircraft that has never flown before.

The empirical consequences were studied by the National Transportation Safety Board in Safety Study SS-12/01, published in May 2012 [NTSB12]. The board examined two hundred and twenty-four EAB accidents from 2011, supported by an EAA survey of more than five thousand builders and owners. Experimental amateur-built aircraft constituted approximately ten per cent of the US general aviation fleet but accounted for fifteen per cent of accidents and twenty-one per cent of fatal accidents in 2011. The safety gap between experimental and certificated aircraft was widest on the first flight and during the early hours in a new pilot’s hands.

The board’s finding on pilot qualification is less often quoted. Pilots of EAB accident aircraft, on average, had appreciably less flight experience in the type of aircraft they were flying than pilots of non-EAB accident aircraft, despite similar or higher total aviation experience. The deficiency was not total flight time but type-specific experience, which is unsurprising because the aircraft has never flown before the pilot climbs into it.

Flight instructors and ferry pilots routinely fly types they have not flown before, but they bring a structured approach to unfamiliar aircraft that compensates for the absence of type time. A test pilot training teaches exactly this competence in systematic form, since it covers methodical envelope expansion, recognition of departure onset before it develops, and extraction of quantitative data from a qualitative impression. The typical builder-owner has none of these skills, and the total time on his licence can be misleading. A builder who spends three to five years in the workshop may accumulate little flying during the build, arriving at the first flight with thousands of logged hours but poor recent currency. A biennial flight review does not produce the handling sensitivity that a first flight in an unfamiliar type demands.

Only thirty-seven per cent of EAA survey respondents who had completed airworthiness certification claimed to have used a very detailed flight test plan [NTSB12]. A separate Master’s thesis at the University of Tennessee Knoxville, examining the 2002-2004 period, found that over one per cent of homebuilt aircraft were involved in an accident on their first flight and three point three per cent in the first forty hours of operation, and concluded that flight test should ideally be left to trained professionals [Dav07].

The FAA responded to the NTSB recommendations in September 2014 with Advisory Circular 90-116, the Additional Pilot Program for Phase I Flight Test [FAA14]. The AC states that the program was developed to improve safety by enhancing builder-owner pilot skills and to mitigate the risks associated with Phase 1 flight testing of aircraft built from commercially produced kits through the use of a qualified additional pilot. The implicit acknowledgement is that the builder-owner pilot’s skills require enhancement and the risks require mitigation. For sufficiently complex aircraft, even this proved insufficient. The Lancair Owners and Builders Organization spent four years negotiating with the FAA over Phase 1 testing of the turbine-powered Lancair Evolution. The resulting arrangement, agreed in January 2016, permits new EVO builder-pilots to fly Phase 1 only with specifically named qualified pilots whose qualifications the FAA has reviewed and whose names appear on a published list [LOBO16]. The builder, who has spent years constructing the aircraft, is not deemed qualified to fly it in Phase 1.

The reference point for qualified in flight test comes from the Society of Experimental Test Pilots, where full membership requires either active engagement in experimental or developmental flight testing for at least one year, or graduation from an accredited test pilot school course of at least ten months in length [SETP24]. SETP recognises eight such schools globally. The gradient between a SETP-qualified test pilot and a typical builder-owner with a private pilot certificate is not gradual but categorical.

When the Layers Compound

The four dispersion layers do not stack additively. They multiply, in the sense that the visibility of the first three to the regulatory testing apparatus depends on the fourth. Build quality dispersion drives flight quality dispersion and performance dispersion. Phase 1 is the regulatory mechanism that is supposed to detect dispersion in layers two and three at the individual aircraft. The detection capability of Phase 1 is bounded above by the assessment capability of the pilot conducting it. If layer four is below the dispersion amplitude of layers two and three, the testing apparatus does not detect what it is intended to detect.

The RV-7 best-glide example illustrates the mechanism. The first builder notes a best-glide speed of eighty knots indicated and documents this in the aircraft’s pilot operating handbook. A subsequent owner finds the correct value to be ninety-five knots, because the static source position produces a calibrated-airspeed error that the first builder had not detected. Without a calibrated boom pitot, without GPS-based ground-speed cross-check across three reciprocal headings, and without the methodological training to perform static-source-position-error correction, the first builder could not detect the error. The aircraft entered service with a false performance figure in its documentation despite Phase 1 having been completed. The error was in the data extraction, which is what the fourth dispersion layer governs.

The Long-EZ rain trim case is more troubling because the inversion runs against Phase 1’s selection logic. A well-built original-canard Long-EZ shows the documented rain pitch-down, whereas a poorly-built one shows no such anomaly. Phase 1 conducted by the builder of the latter aircraft finds no anomaly, so the aircraft passes with quieter test data than its better-built sibling. The selection mechanism that Phase 1 was supposed to provide has selected for the worse build.

Phase 1 is a test conducted by the same person whose work is being tested, with no independent calibration of the testing instrument, and with no comparison against type-level reference data because no type-level reference data exists at the granularity that would be needed. Type clubs partially fill this gap for the most popular types, but the structural problem remains.

Three Trust Architectures

MOSAIC, effective 24 July 2026, and its European counterpart Part-21 Light do not require the authority to verify every compliance step, which aligns them with EAB [FAA25, EU22a, EU22b, EASA23]. But substantiation under both regimes is the manufacturer’s responsibility and is performed once for the type, not once per aircraft, which aligns them with classical type certification. The forum argument misses the boundary between these two inheritances. Phase 1 flight testing under MOSAIC and Part-21 Light Declared is a conformity check at the production end of the chain, since the consensus standard carries the substantiation, and EAB has no equivalent separation.

This is a different trust architecture. Type certification places trust at an independent verification body, whether the primary authority, a national aviation agency, or a delegated organisation, whereas EAB places trust at the builder-owner-pilot in Phase 1. MOSAIC and Part-21 Light, by contrast, place trust at the organisation, which signs declarations against external consensus standards and accepts liability for the truthfulness of those declarations. These are three architectures for three different scales of production, and the forum argument conflates the second and the third.

Scale It and Watch It Break

Apply the forum argument’s logic to serial production. Imagine a manufacturer who produces aircraft under MOSAIC or Part-21 Light Declared, and who treats Phase 1 testing at each delivered aircraft as the substantiation rather than as a conformity check. There is no design organisation and no type-level substantiation effort. Each aircraft receives its twenty-five to forty hours of Phase 1 testing by its first owner, and that is the safety case.

Build-quality dispersion across the fleet is not detected at the manufacturer, so the four dispersion layers compound across the fleet without correction. The Phase 1 testing is performed by buyer-builders whose qualifications are systematically below those of the builder-owners in the current EAB world, because the population of people buying serially-produced aircraft differs from the population of people who undertake years-long construction projects. The Phase 1 selection criterion of a motivated, hands-on builder does not apply to the buyer of a finished aircraft.

The manufacturer who anticipates this and wishes to control fleet quality will start to build internal quality systems. He will produce standardised Phase 1 protocols and require buyer-builders to follow them, collect the data from completed Phase 1 programmes to detect systematic issues, define minimum qualifications for the Phase 1 pilots while offering training, and publish service bulletins that arrange for installations through authorised maintainers. If he wants external validation, he will accept third-party audits, which means he is at a Quality Assurance system under the ASTM F2972 framework, which means he is at MOSAIC. If he wants more, if he wants the authority to vouch for him so that customers in other jurisdictions accept his aircraft, he is at Design Organisation Approval and Production Organisation Approval, which means he is at type certification.

EAB cannot scale to serial production without becoming something else, because the very features of EAB that make it work at low volume, namely the alignment of builder and tester in a single motivated person, do not survive serial production.

The Serious Question

The argument’s empirical core, that the experimental fleet flies acceptably, is correct, but it points the other way once one asks why. The experimental fleet flies acceptably because every airframe passes twenty-five to forty hours of Phase 1 under the authority’s certificate and limitations, which is the opposite of an absence of oversight, and that per-airframe oversight is precisely the cost no production business can carry across a fleet. Type certification answers the same problem by substantiating once for the type under the authority’s review and then checking conformity per aircraft. MOSAIC keeps neither the per-airframe Phase 1 of the experimental regime nor the authority-reviewed per-type substantiation of certification, and the experimental fleet’s record cannot be lent to it, because that record was earned by the very oversight MOSAIC removes.

The FAA’s own regulatory trajectory, from AC 90-89A through C, through AC 90-109’s explicit acknowledgement that wide variation in stall behaviour occurs in identical models of amateur-built aircraft [FAA10], through AC 90-116’s Additional Pilot Program, through the LOBO arrangement for turbine-powered Lancair Evolutions, traces a fifteen-year process in which the agency has progressively retreated from the original assumption that builder-owner Phase 1 testing produces certifiable-quality data. The retreat has not abolished the EAB regime, and there is no reason to abolish it, because EAB serves its constituency well. What the retreat establishes is that the EAB regime cannot be the model for serial production, since the assumptions that support it do not hold when production scales.

The architecture question is about who carries the substantiation, and the three regimes answer it differently according to the scale of production and the distribution of competence each one assumes. Asking why MOSAIC is not more like EAB is asking why a regulatory framework for serial production should look like one designed for one-off hand-builds. The answer is that it should not, and that the people who built MOSAIC and Part-21 Light understood this when they wrote the rules.

The serious question, the one that the forum argument blocks rather than advances, is whether the consensus standards, the manufacturer’s quality systems, and the regulator’s first-article inspection together constitute an adequate trust architecture for serial production at the scale that MOSAIC envisages. That question is worth debating on its merits, in light of the empirical record once enough aircraft have been delivered under the new rule to provide data.

If you find value in Engineering Airworthiness, consider subscribing for free.

If you think someone might benefit from it, feel free to share it.

References

[Cor19] P. Cornick, “Souls On Board, Two: The FAA Allows an Additional Pilot in Phase 1,” Kitplanes, 2019.

[Dav07] J. Davis, “Application of Professional Flight Test Practices to Homebuilt Aircraft Flight Testing,” Master’s Thesis, University of Tennessee Knoxville, 2007.

[EASA17] European Union Aviation Safety Agency, “Certification Specifications for Normal-Category Aeroplanes (CS-23), Amendment 5,” 2017.

[EASA23] European Union Aviation Safety Agency, “AMC and GM to Part 21 Light, Issue 1,” ED Decision 2023/013/R, October 2023.

[EGA25] EuroGA Forum, “The effect of FAA MOSAIC in Europe,” discussion thread, October 2025. Available: https://euroga.org/forums/aircraft/16704-the-effect-of-faa-mosaic-in-europe

[EU22a] European Commission, “Delegated Regulation (EU) 2022/1358 (Part-21 Light),” 26 May 2022.

[EU22b] European Commission, “Implementing Regulation (EU) 2022/1361,” 28 July 2022.

[FAA10] Federal Aviation Administration, “Advisory Circular 90-109A, Airmen Transition to Experimental or Unfamiliar Airplanes,” 2015.

[FAA14] Federal Aviation Administration, “Advisory Circular 90-116, Additional Pilot Program for Phase I Flight Test,” September 2014.

[FAA23] Federal Aviation Administration, “Advisory Circular 90-89C, Amateur-Built Aircraft and Ultralight Flight Testing Handbook,” February 2023.

[FAA25] Federal Aviation Administration, “Modernization of Special Airworthiness Certification (MOSAIC) Final Rule,” 90 FR 35207, 24 July 2025.

[FAA91319] Federal Aviation Administration, “14 CFR 91.319, Aircraft having experimental certificates, Operating limitations,” 2024.

[KPl20] Kitplanes Staff, “Buying Used: Long-EZ,” Kitplanes, 2020.

[LBA21] Luftfahrt-Bundesamt, “Merkblatt 240.1, Prüfung und Zulassung von im Selbstbau hergestellten Einzelstücken gemäß § 3 LuftGerPV,” 2021.

[LOBO16] Lancair Owners and Builders Organization, “Qualified Pilots Under FAA AC 90-116,” LOBO MOU 21 January 2016.

[Mar25] A. Marinosyan, comment on “The Least Oversight Where It Matters Most,” Engineering Airworthiness (Substack), 3 May 2026.

[NASM] National Air and Space Museum, “Rutan VariEze, A19860067000.”

[NTSB12] National Transportation Safety Board, “The Safety of Experimental Amateur-Built Aircraft,” Safety Study SS-12/01, May 2012.

[SETP24] Society of Experimental Test Pilots, “Membership Requirements and Recognized Test Pilot Schools,” 2024.

Discussion about this post

Ready for more?