Sunday, 14 September 2014

ISO 29119 Questions: Part 2

Content questions

There is an ISO software testing standard (ISO/IEC 29119) parts 1-3 published, parts 4-5 in draft, ref [1-4]. I have read parts 1-3 and a draft version of part 4 and am and have been forming several questions for some time. Some questions might be regarded as problems, issues and others as need for clarification.

I will use a series of 3 posts to document. 

Part 1: Looks at the reasoning for the standard - based on the information publicly available, ref [8].
Part 2: Looks at some of the content.
Part 3: Looks at the output and usage of the standard and some implications.

This is a snapshot of some items analysed in 29119, it is not a full review - but the comments are representative of what I have found in parts 1-4.
It looks at the validity of the model, the aspect of compliance/conformance to 29119 and some of the language and descriptions used.

Process Model

Process Model Validity?
The standard presents a set of definitions, models for processes and templates for artefacts. The implication (that I assume) is that these are needed, required or observed where “good” software testing happens. I make this assumption as anyone would hardly take the effort to standardise approaches that are ineffective, would they?

For these process models to have any validity I’d expect external and internal validity to be demonstrated or the basis on which they are claimed to be shown. In fact, you’d expect research of model types that work, and which type of populations (organisations, products and project specifics) to be the basis for a standardisation effort.

Internal process model validity
This handles the question of cause and effect. By following this process model it produces something. Good software, happy customers, good testing? It isn’t really stated. A question I have with a process model - that is presented as a standard - is where is the relation between the model (construct) and the result/interpretation (“good” software, or “good” software testing).So what’s the purpose of the model - and what will happen if I follow it. The effects are not stated. There is no evidence of it being applied (tried or tested). Therefore, internal validity of the process model cannot be assumed. I would suggest this is needed for standardisation. 

External process model validity
This handles the question to which the process model can be applied generally - across which population of products, projects and organisations. Again there is no indication (evidence) about which range of products and organisations this can be reliably applied to. It is claimed that it can be applied for any organisation or software testing. This claim is not supported by evidence - reference to the input cases for which this claim is made. Therefore, the claim of application (any software testing and any organisation) is unsupported. Rhetoric and guesswork.

Process Model - Default Form: Waterfall
It is striking looking at the process model flows how waterfall like they are. In 29119-2, 7.2.1, the test planning process is shown. It is also visible here, from ref [6]. There is a note saying that it may be done iteratively. I think leaving the default model being a form that Royce said “is risky and invites failure”, in 1970 ref [9], is poor work. It seems that some real experience “that can be applied anywhere” is not the basis of this flow. 

The last part of “Gain consensus of test plan” (which is labelled TP8 in 29119-2) is that approval is gained from the stakeholders. The last part of “Publish Test Plan” (labelled TP9 in 29119-2) is to communicate the availability of the test plan. Wait, the test plan your stakeholders have approved in the previous step - you’re now going to tell them that it’s been approved? Do you think this might be annoying, appear wasteful or potentially just incompetent? I think there’s a risk for that - if you follow it by rote.

Test status and reporting is only described within the context of a completion process. Communication is framed as a handover (or asset even) rather than something that starts early and is continuous. This seems to be framed as a contractual type of reporting. The lack of integration with a software development process model is striking - there is no linkage to software development other than “design then test” - so this seems to be aimed at non agile shops, non incremental or iterative development shops. So who is it aimed at? Testing and test results can and should affect decisions within software development (including requirement refinement and discovery), but this seems to be absent in 29119 - possibly, I hypothesise, because it is a pure waterfall-test-separate-from-development model.

The process model is a mine field of contradictions and potential question marks (for me) - which is one reason I can think that it hasn’t been tested / used in real-life. Note, I have read that a draft of 29119 has been used in a study - this will be the subject of a future study. 

Process Model Conclusion 
The problem for me is that definitions are stated and process models are stated. But the validity of the process models producing good (meaning to me: efficient, effective and reasonable) software testing is not stated, linked, demonstrated or explained.

I’d like to see - and expect a linkage to be demonstrated - in a model that is at the basis of a standard - between theory (what is intended to be the output of such a model) and process model, and between study of practice (evidence to support the model that is intended to be standardised) and the process model. Where is the linkage, or evidence of this linkage - I can’t find it in 29119? Therefore, I cannot see any basis for this process model or any internal or external model validity. Conclusion: speculation, gamble.

The standard appears to lay the groundwork for a paper-trail between test requirement/inception and final test report. However, the rigour that 29119 might want to put into this paper-trail is strikingly absent in it’s own “trail” (or line of reasoning) between “test need”, “test output” and “test effectiveness and efficiency” - i.e. it omits the evidence for it’s claim that this is a model that facilitates “effective and efficient” testing, or is even a useful model. How did that happen? It seems to be a “Don’t do as we do, do as we say” approach - and I can’t decide if it’s sloppiness or lack of rigour or some oversight on the part of the WG or ISO or both. I do wonder if the result might have been different if ISO had recruited (with interviews) people to draft a standard rather than leave it to volunteers.

Perhaps the standard - and any accompanying audit according to the standard - is meant to facilitate demonstration by the tester (or organisation) that they have formed a traceable approach to their work and can connect meaning for the work and results to a stakeholders needs. This is a potentially valid and useful aim. But the standard in itself will not necessarily facilitate this - an organisation could produce masses of documentation according to the standard and still have work that isn’t effective, efficient or even repeatable. I will dig more into this question in part 3 of this post series.

An organisation can claim full or tailored conformance to 29119.

In part 29119-2, section 2, the processes (meaning - I think - activities) involved are described as the basis for conformance (or not). Due to the concerns above (with the model validity) this could lead to a number of scenarios where an organisation is requested to state its conformance to 29119.

1. Full conformance is claimed (but there is no understanding of the implications or the organisation hasn’t spotted the problems in the model).
2. Tailored conformance is claimed (but there is no understanding of the implications or the organisation hasn’t spotted the problems in the model).
3. Tailored conformance is claimed (and no part of the process model is followed). The implication here is that the organisation understands (and can demonstrate) what they are doing and maybe even sees 29119 as an obstacle to efficient and effective testing.
4. Non conformance is claimed and the organisation understands (and can demonstrate) what they are doing and maybe even sees 29119 as an obstacle to efficient and effective testing.
5. Some combination of the above.

So, a statement of conformance potentially doesn’t say so much.

Language and Claims
I don’t know if the editing, writing style or understanding of the topics is at fault but I know poorly formulated or illogical statements don’t help in understanding 29119. This is meant to be a standard, developed and reviewed over quite a time - not a soundbite or copy/paste/hope-it-makes-sense effort. 

For instance, the notes for test strategy (29119-2, state that the estimate of “elapsed time” should be made. This is like saying, “estimate the actual time it takes rather than the time you think it might take…”

- Context, Understanding
On the part about understanding context (29119-2, it’s not actually stated what is meant with “context” - so a statement such as “understanding of context and software testing requirements is obtained” is simplistic, misleading and poorly written. On “understanding … is obtained” - how should that be measured? Well the rest of the statement is “to support the test plan”. So the implication (as I can read it, as there is no other guidance on interpretation, once there is a scope in a test plan - or really test plan - then “understanding is obtained”.) As someone who talks to plenty of non-testers you cannot point to a document and claim that understanding is “obtained”.

Ah, but wait, point (b) in the clause might help. You should identify and “interact” with “relevant stakeholders”. Oh, not any old stakeholder (which would be an oxymoron, I think) but relevant stakeholders. Clumsy language -> like saying, “talk to the right people”. So it appears as though the standard is using a convoluted way of saying, “talk to the right people”. Simplistic and obtuse.

Section 29119-1, 5.2, does give some words on project and organisation context. However, this is framed in the model of 29119-2. This model has a number of validity issues (see above) and is, in essence, forming a circular argument. If you adopt the process model (from 29119-2) then form an understanding of the organisational and project context then the process model (in 29119-2) can be followed. But the context understanding is framed by the process model and the context understanding should be formed somewhat separately (independently) of the process model. Or looking at it starting with 29119-1 (5.2) to understand the project context you need to adopt the process model frame from 29119-2 - same circular dependency. Circular reasoning -> faulty basis for usage, and potentially misleading.

- Stakeholders
Reference to “stakeholders” or “the stakeholders” occurs quite often in parts 1 & 2 without actually suggesting who they might be. I think putting a lot of time into describing a model of a process without describing the actors in the model is a serious flaw in a model. It means it is so generic - to be applied (or maybe really overlaid) anywhere. It also suggests the validity of the model or actual usage was never considered. Unclear and potential for misunderstanding.

- Engineering Heuristics
In 29119-1, clause 5.1.3, testing is described as an heuristic approach, although it is described in a muddled fashion. It correctly states that heuristics are fallible but that knowing this allows multiple test strategies to be used to reduce the risk of using an ineffective test strategy.

  • Why would you knowingly use an ineffective test strategy (I can think of a reason, but the impression is that you shouldn’t use one, so the implication is that an ineffective test strategy could be unwittingly used…)? 
  • Or does this mean that multiple test strategies should always be used because you don’t know which are effective or ineffective? The implication is that we don’t know the consequence, limitations and benefits of any test strategy - and that I’m not sure I can agree with.
  • Of course, the standard gives no guidance on what is effective or not. So the purpose of the standard is what exactly? 

It seems to be advocating - do multiple things because you might not know what you’re doing. I take the view that if you don’t know what you’re doing then you need help, training, practice and even coaching, not a blunderbuss approach that may do more harm than good. Clumsy and Reckless.

- Exploratory Testing Definition
In 29119-2 (clause 4.9) describes this type of testing as spontaneous and unscripted. Then the authors couldn’t imagine a case where investigation, deliberation, preparation and even scripting might be needed to perform testing; a case where the result/s of that test might dictate the next steps. This, in itself, is not unusual as I think many people equate “exploratory testing” with something to do with little preparation rather than using results to guide future tests. I have prepared and guided this type of approach in the past. Therefore the “definition” in the standard is erroneous and unhelpful - either to me as a tester, test strategist or test communicator.

- Standard, Dictionary or Constraint?
By attempting to define (describe) all types of testing (I won’t say approaches as I don’t think that was considered) then this limits my potential testing. If I discover, develop or talk about a new (or variant) of a technique or way of testing then I am (per default) non-compliant to the standard. So the standard is not a guide to good, appropriate or effective ways to test. Erroneous.

In the draft of part 4, 29119-4, the conformance part states that techniques used that are not described in 29119-4 should be described. This seems to be a way of not capturing everything in the standard - and is a way of avoiding debatable definitions and muddled language.

It seems to me that the standard is really attempting to be an encyclopaedia of testing - but when I see muddled language, claims without support, circular reasoning and what appears to be untested models it makes faith (and trust) in definitions a little difficult. An encyclopaedia is not the same as a standard - so I think the intention behind the standard (and the result) is muddled.

- Informative?
29119-1 is informative. This means (according to 29119-1, 2) that conformance (agreement or compliance) is not needed. This means it is optional or that it is open to interpretation. One consequence is that having this as an input to part 2 - a generic model - means that the model is open to interpretation. Another consequence is that it’s like a poor dictionary - at best a guide to current usage, at worst misleading. Superficial.

- Process?
I think there’s probably some confusion about what “process” means. It’s what happens (according to Mirriam-Websters dictionary and the OED). How does that fit into a process model? A model of things that happen? Ok, and this is a generic model. To produce what? That’s not defined. So it’s trying to describe a generic set of actions that happens in software testing without any description of an output. Why would you do that? I can hypothesise that (i) you might perform good testing already (which your organisation might regard as efficient and effective), then 29119 is of no use and may even be detrimental; (ii) you’re a start-up, new org or org without any knowledge of how to approach testing - then 29119 might be used to produce some documentation and activities - but as there is no construct validity in 29119 that may be a totally inappropriate approach also. So what use are generic models with generic activities? Misleading and lacking evidence of validity.

- Repeatable?
There is a claim that businesses have an interest in developing and using repeatable, effective and efficient processes (29119-1. 5.2.) It seems natural to want your activities (processes) effective and efficient. But repeatable - in which scenarios is repeatable desirable for the way testing is carried out? Does this mean repeatable test cases in scenarios where test scripts are re-executed as some safety net (e.g. in a continuous integration loop)? Fine. But test analysis, planning, design and reporting. Should this be done in a repeatable way? The case isn’t made. Rhetoric.

- Scripted?
In 29119-1, 5.6.6, it describes advantages and disadvantages of scripted and unscripted testing. It is claimed that scripting (in this case, a tester following a script) makes the testing more repeatable. Here, “repeatable” is debatable to me - I’ve seen people following a script that don’t do what the script says - there may be some extra step inserted or some extra observation made or some step that is hopped-over, but the script isn’t strictly followed. So, I think these advantages and disadvantages haven’t been challenged or compared with real experiences. Erroneous.

- Test policies and strategies
In 29119-1, 5.2, it claims that where formal organisational test policies and strategies are absent then the testing there is “typically” less effective and efficient. This is an interesting claim - and could read by some that this must be in place. There are 2-3 problems with this claim:
1. The connection (or correlation) demonstrated in cases where such “documents” are in place and the performance of the organisation/company being effective and efficient is not demonstrated. (I.e. there is no study of “effective and efficient” testing or “effective and efficient test organisations”  and what might affect their output).
2. There is no study or evidence to show that the presence of such a formal item (it probably is present even where not formally present - part of the test culture) directs the testing in an organisation - cases where it is present and has no effect, or where it is absent and the testing is deemed “efficient and effective” anyway. 
3. Where such a formal item is in place - it is not clear if that comes afterwards - i.e. is not the cause of “effective and efficient” testing, but a byproduct (related to #1)
No evidence, rhetoric.

Examination of Claims and Language 
Some might think I’m being “picky” when I read 29119 - or being nit-picky. Actually, this is a tester reading unsubstantiated claims - or rather noticing claims that are unsubstantiated. That’s not being picky, it’s calling out shoddy work, highlighting issues and raising a warning flag. It’s basic understanding of cause and effect - and being able to distinguish between them. Why is this important? Because if one reads 29119 and is not able to understand the claim, what it implies and doesn’t say, then there’s a strong risk that it is followed by rote. And as the output of the model is not described, following it without understanding its pitfalls is potentially harmful.

A generic model?
It is not stated but I suspect 29119 is trying to be an encyclopaedia of testing, presenting a menu from which to select which activities to do and perhaps a way to structure it. However, the output is not defined - the aim of the testing - and an assessment of which may help or hinder those results. The process models have no demonstrated validity - meaning it is open to interpretation what they will produce and, more seriously, what readers of 29119 think they might produce, how they might be applied and how to observe if the models are relevant or have any contribution to an organisation’s testing effectiveness or efficiency. Therefore, the generic nature of process models and the association with an idea of conformance is really dangerous. One might conclude that 29119 (in it’s current form) is a clear and present danger to an organisation’s testing if a question of conformance is raised.

In this post I’ve mainly focussed on parts related to 29119-1 & 29119-2. 

The lack of output description and consideration (of the process model) is serious.
The lack of process model validity - any evidence of validity or applicability (or not) of generic models of activities - is serious. It verges on being dangerous and misleading.
The muddled language is serious.
The lack of apparent rigour in something it is trying to describe - whether model, definitions process model applicability - is serious.
The notion of conformance which my reading of 29119 means is not possible - part due to the way part 1 is defined as informative, part due to the lack of rigour in the models, part due to the apparent waterfall approach to modelling, part due to the muddled language. This means that 29119 can not be used or applied in a repeatable, efficient or effective way - means that it would be misleading to claim conformance to 29119. Claiming conformance is a potential warning sign
The amount of rhetoric and unsupported claims are potentially confusing to the reader.

I get the impression that the result (29119) is due to a number of factors

1. It’s a result of the underlying reasoning and motivation for 29119 not being clear, ref [8]
2. It’s a factor of the working group’s interpretation of said motivations and reasoning - and maybe not being able to clarify or communicate that
3. It’s a result of some underlying assumptions (or beliefs) that the working group haven’t declared
4. It’s possible that the underlying beliefs were not visible in the working group or had different interpretations (because they were not visible)
5. It’s a result of ignorance in processes, how to observe processes and make hypotheses based on observation.
6. It’s a result of ignorance about model validity, experimental design and limitations of these.

The result? An Example
When I read 29119 I get the impression that it’s like Royce’s paper, ref [9], was read to page 2 - and crucially not page 3 or the conclusion - because 29119 seems to model what later became known as waterfall, and ignored any iterative corrections.

Royce warned of the dangers with the model - the type of model displayed in 29119 - over forty years ago. Why is there a belief that the type of model in 29119 “works”? My guess would be that it’s a result of poor observation amongst the other reasons above.

For example

1. Project X (in company Y, under circumstances Z) “worked” and we followed Royce’s-page2-model (waterfall), ref [9]
2. Project A (in company B, under circumstances C) “worked” and we produced lots of documentation

Someone could draw a conclusion that it’s the documentation that produced the good result in A, and others that it was a waterfall model that produced a good result in X. Someone else could conclude that combining “documentation” and “waterfall” would produce a good result in a new project. Of course, this “hypothesis” is dangerous and reckless. It’s not even certain (or there may be a lot of doubt) that the reasoning for X & A “working” was correct, and it’s very probable that we can’t, couldn’t or didn’t observe the most important factors that contributed to their results (meaning the understanding of Y & B, or Z & C was not what we think it is). Projects X & A might’ve worked in spite of “waterfall” & “documentation”.

This is the reason why generalisation, and not taking close study of underlying factors of “processes”, and not establishing validity of a model is dangerous. Not being able to connect an observation to a symptom and cause is dangerous. Therefore, I think the reasoning (or absence of reasoning), support or conclusions in 29119 is dangerous. There is no evidence that this model “works”, or will produce anything that a customer can use.

I suspect the intention of 29119 was based on experiences that have worked for the contributors, but the case for internal and external model validity is not made. So, advocating those models outside of the people that have used them is not supported.

Locked-in Planning and Feedback?
Fred Brooks, chapter 11 in ref [7], wrote in 1974 about the need to plan to throw one away - that’s about understanding the limitation of planning and process models, and not locking into a waterfall model. Cosgrove, ref Brooks chap11, [7], in 1971 claimed that programmers (and organisations) deliver satisfaction of a user need, and that this perception of need changes as software is developed and tested. There is no clear consideration of user need, satisfaction or change in 29119 - i.e. there is little connection to software development in 29119, especially agile, incremental or interative ways of working.

I get the impression it’s a contractual model - if you structure your testing along these lines and your stakeholders sign-off on this then you’re “protected” from blame. The model is not directed at producing something that a customer will be happy with, but rather something that a customer potentially signs-off on something before any prototype is produced.

It seems to me that there is no result-orientation - it’s about standardising locked-in paper trails early rather than working software. There is no guidance on how to square that circle with agile approaches. In fact, if you work in an agile shop and someone starts talking about the process model from 29119 and how to adopt it, I’d be worried - that’s a “bad smell”.

Practice, and Consolidation in a Standard?
There is no evidence of study or observation of test processes in practice or support of claims to say “X” is the factor that contributes to makes this process effective and efficient. This would require a social science type of study of people working (process), of organisational structures, project set-up and outcomes. And it seems to be missing even on the smallest of scales - which would be a case study to support a claim of a specific process.

I get the impression that the working group didn’t make (or search for) any study of factors that contribute to “effective and efficient” testing or testing results. To use the terminology of 29119-1, this is “error guessing”. However, as there is no assessment of “errors” (problems and pitfalls to avoid) I think of it as just plain guessing. Rhetoric, guesswork and superstition.

I can’t work out why no one thought of this in the 6 years 29119 was under development - because if they had then I might not be having these questions now.

And finally…
I can borrow from “Yes, Minister”, ref [5], when I think about the connection between a process model and reality as conveyed in 29119 and in the style of clarity I get from 29119-> 
“the precise correlation between the information … communicated and the facts, insofar as they can be determined and demonstrated, is such as to cause epistemological problems, of sufficient magnitude as to lay upon the logical and semantic resources of the English language a heavier burden than they can reasonably be expected to bear.” 
I.e no evidence of the validity in the claims made. So yes, the overwhelming impressions I take from 29119 are of unsubstantiated claims and rhetoric

[1] 29119-1: ISO/IEC/IEEE 29119-1:2013 Software and systems engineering -- Software testing -- Part 1: Concepts and definitions
[2] 29119-2: ISO/IEC/IEEE 29119-2:2013 Software and systems engineering -- Software testing -- Part 2: Test processes
[3] 29119-3: ISO/IEC/IEEE 29119-3:2013 Software and systems engineering -- Software testing -- Part 3: Test documentation
[4] 29119-4: ISO/IEC/IEEE DIS 29119-4.2 Software and systems engineering -- Software testing -- Part 4: Test techniques
[7] The Mythical Man-Month: Essays on Software Engineering [F.P.Brooks, 1975]
[8] The Tester’s Headache: ISO 29119 Questions: Part 1
[9] Managing the Development of Large Software Systems [Winston Royce, 1970]