3
Dec

Sense and nonsense of metrics

metrics2 First of all it is necessary to understand what we are talking about. Best way to explain is with an example.

If you measure the weight of a person, you probably express this in a unit like kilograms, pounds, stones, etc.

But what you are in fact doing is comparing the person’s weight against a standardised measurement unit of weight, called a metric.

A metric is a unique way to define what you want to measure. When weighing a person, you perform an action on the person, the action called ‘measurement’. The action is based on the metric and the result of the measurement is expressed in the metric.
After the action what is left is the result of the measurement, the measure.

In short, a metric is a unit definition, used in the action of measuring. The result of a measurement is compared to this metric, leading to a unique and uniform  and common understandable measure.

But, that is only one direction, from definition to action to result, but there is also the reversed direction.

Sometimes it is necessary to put (counter)measures in place, reactions on a situation that occurs, most of the time when the result of a measurement shows that certain boundaries were (not) reached, depending the situation.

These boundaries are often referred to as (entry and/or exit) criteria.

E.g. The music is too loud and exceeds the allowed boundaries, therefore you have to put the volume down.

It is this flow of events that leads to the common believe that what you measure, you better understand, you can control and what you can control , you can improve. If you can’t measure it, you can’t understand it, you can’t control it and therefore can’t improve it.

“Measurement is the first step that leads to control and eventually to improvement. If you can’t measure something, you can’t understand it. If you can’t understand it, you can’t control it. If you can’t control it, you can’t improve it.” – H. James Harrington

metrics

The formula

But, what happens if you encounter a situation where the maths are not that simple. E.g. performance of a company, an ecological system, traceability in software testing, etc.

In that case it all starts with the definition, the metric, of what you want to measure and often you will find out that you need multiple comparisons, multiple units/metrics, but that it is still possible to break down what you want to measure to a number of measurements, giving you a unique and uniform view on what you are measuring and allowing you an objective way of evaluating the situation.
Again, this is easier to explain with an example. Let’s have a look at traceability in software testing.

We start from the assumption that every requirement needs to be covered (is traceable to) by at least 1 use case, 1 use case needs to be covered by at least 1 positive test suite and 1 positive test suite must be covered by at least 1 test case.

I know, that this is probably not the way you are setting up your testing, but bare with me just for the example.

In the above setup I will be measuring :

  • how many percent of requirements are covered by a use case
  • how many percent of use cases have at least 1 positive test suite
  • how many test suites are covered by at least 1 test case

These 3 numbers give me a pretty good view starting from requirements.

On the other hand, we also can go the reversed way.

  • How many percent of test cases are used in a test suite?
  • How many percent of test suites are related to a use case?
  • How many use cases are covering requirements?

The 6 numbers together give us even a better view on the traceability between requirements/use case/test suites/test cases.

The metric is each time a ratio that can be expressed as a percentage and it allows to make an objective evaluation of the traceability.

It doesn’t give a complete view. ‘A requirement must be covered by at least 1 use case’ doesn’t tell you whether the requirement is covered enough by one use case. (But that is perhaps for the next article)

What I tend to do is to have 1 formula that combines all numbers and that allows to score e.g. traceability. In the above example you could take the average of all numbers, or you could use a radar diagram to show the footprint of the traceability as shown in the image.

But, what happens if there are unknown elements playing a role. elements you can’t put your finger on? E.g Stock market, profiling a serial killer, etc.

the-metrics-209x300

Too complex

If you have a system with so many variables, so many parameters playing a role, it becomes harder to find the right definition for the metric. You could even question yourself if it makes sense to define a metric.

The management of a company loves numbers. Preferably numbers that show how well they are doing. These numbers are often referred to as KPI’s (Key performance indicators) and the name already explains the difference with a metric.
A metric gives a fix, uniform, consistent view on what you want to measure. A metric will allow an unbiased evaluation.
A KPI is nothing but an indicator and the (subjective) interpretation of the indicator is as important as the KPI itself.

E.g. Based on budget approved cost, earned value, percentage completed, etc. it is possible to calculate the cost indicator and performance indicator. Numbers greater than 1 indicate a positive trend, while numbers less than 1 indicate a negative trend.

At first sight you could think that this is a metric, but nevertheless, the formula used to calculate these indicators are experience based formulas and don’t tell why the result of the indicator is what it is.

Measuring traceability we exactly could tell why a number was low. Requirements are not covered by use cases, so by writing more use cases we know that the number will improve.
There is a 101 relationship.

In the case of e.g. a cost indicator we don’t know. Are we using too many people, too expensive people, too expensive equipment, etc.

That’s why it is an indicator and not a metric. But they have there benefits in helping people to understand complex systems and situations. They are build on statistical data and experience and often will lead to the correct evaluation.
The only difference with metrics is that this evaluation is biased and depends of the person making the evaluation.

I couldn’t profile a serial killer, even if I wanted, but profilers can because they have the experience to evaluate the statistics captured from a crime scene in order to come to  conclusions on who can be the perpetrator.

We only have to be aware that numbers, statistics, indicators are not ‘holy-making’ and should never replace common sense.

False witnesses

The quote mentioned above “Measurement is the first step that leads to control and eventually to improvement.
If you can’t measure something, you can’t understand it. If you can’t understand it, you can’t control it. If you can’t control it, you can’t improve it.” holds a huge risk.

It implies that if you want to improve something you MUST measure it. It doesn’t say HOW to measure it.
The risk is that people will come up with numbers that in fact give a ‘false witness’. Numbers that apparently show something, but in fact can mean something completely differently.e

.g. In software testing, one of the numbers that managements like is the number of defect found for each executed test case. There is even a benchmark that says that in average 1 defect is found for every 2 test cases executed.

People could think that if they find 0,6 defects per test case, that they are doing a great job, but who is saying that the testers are doing great?
Perhaps the developers are screwing up big time and their software contains more bugs to be found.

As indicated before, you must be aware of the difference between metrics and indicators. #defects/# executed test cases is an indicator, but will need a subjective evaluation to find the root cause if the number indicates a problem.

Metrics and Quality

It is a huge leap to go from metrics to quality (and should be the subject of another article), but I don’t want to withhold the following rules I found. My apologies, but I don’t know anymore where I got them from, but I do know that they help me to figure out when a number is a metric, or not.

  • Correlation : linear relation Q/M
  • Consistency : Improving M = improving Q
  • Tracking : Changing M = Changing Q
  • Predictability : If M then Q
  • Discriminative power : High M,Q ≠ Low M,Q
  • Reliability : M valid for x% of use of M
  • (M = Metric; Q = Quality)

What these rules in fact say is that a metrics are related 1-on-1 to a quality characteristic. If the metric improves, also the quality improves, if the metric changes also the quality changes and visa versa. The link between metric and quality characteristic is not occasional, but can be used at all times. If the metric changes, the quality has to change with relatively the same amount. Further more, metrics must be valid almost all the time. Only e.g. a division by zero, allows the metric not to be valid.

Sense and/or Nonsense?

And, this brings us to the title of this article. The sense and nonsense of metrics. I would even dare to take it one step further and talk about the sense or nonsense of numbers (metrics and/or indicators).

Note 1:

  NEVER rely solely on numbers. Make sure that numbers are always interpreted by capable people and if you communicate them, make sure the interpretation is part of the communication.

Note 2:

  Numbers must mean something, so it is a good practice to link them to what you consider a quality characteristic. Often the GQM (Goal – Question – Metric) method of Basili is used to do this. Also ISO and IEEE standard concerning quality contain references to a bunch of measurements you can make.

Note 3:

 NEVER measure because it is easy to do. Sometimes it is easy to define a metric or indicator, but nobody does anything with it. It is nice to know the shoe size of the project manager, but it will not help our project.

Note 4:

 Measure Everything Thats’s Required to Increase Customer Satisfaction.

ALWAYS think of what you want to achieve and for whom you are doing it. Although it is not always clear who your customer is. As a test manager, who is your customer? Your management? The project manager? The client that will use the product? If not clear, prepare to have multiple customers and to have multiple sets of metrics.

Personal note

Personally I tend to make everything I do measurable. I makes it easier to stop myself. I tend to get so passionate about what I do, that I tend to do more than necessary. By defining metrics and spending the time on defining metrics, I trap myself and am I able to keep myself under control. I guess this is maturity, no?

Please, share your experiences with metrics …

Notes

Measure
  • Ascertain the size, amount, or degree of (something) by using an instrument or device marked in standard units.
  • Estimate or assess the extent, quality, value, or effect of (something).
  • Judge someone or something by comparison with (a certain standard)
  • Scrutinize (someone) keenly in order to form an assessment of them.
  • A plan or course of action taken to achieve a particular purpose.
  • Punishment or retribution imposed or inflicted on someone.
Bubble charts

Another nice way to represent data and to make a score visible, are bubble charts. Not only you can show the importance of the items, but you can also show the weight of the item. E.g. A risk with a higher impact and a higher probability has a higher weight.

How to measure a serial killer?

Not so long ago there was this guy that thought he could measure when someone was a serial killer.

By measuring the size of the eyes, the forehead and a bunch more of similar measurements, he claimed that he could predict who could be a serial killer.

I’m pretty sure that he annoyed a lot of people and also falsely accused people, but in the end common sense won.

Now, profilers use statistical data to figure out the motives and the profile of serial killers, but they also know that it is not an exact science. The subject opinion as well as the experience of the profiler play a big role.

Victor R. Basili

Victor R. Basili, born April 13, 1940 in Brooklyn, New York, is an Emeritus Professor at the Department of Computer Science and the Institute for Advanced Computer Studies, both at University of Maryland. He holds a Ph.D. in Computer Science from the University of Texas at Austin and two honorary degrees. He is a Fellow of both the Association for Computing Machinery (ACM) and of the Institute of Electrical and Electronic Engineers (IEEE).

From 1982 through 1988 he was Chair of the Department of Computer Science at the University of Maryland. He is currently a Senior Research Fellow at the Fraunhofer Center for Experimental Software Engineering – Maryland and from 1997-2004 was its Executive Director.

He is well known for his works on measuring, evaluating, and improving the Software development process, especially his papers on the Goal/Question /Metric Approach, the Quality Improvement Paradigm, and the Experience Factory “

12 Characteristics of Effective Metrics

Wayne W. Eckerson on April 19, 2010.

Creating performance metrics is as much art as science. To guide you in your quest, here are 12 characteristics of effective performance metrics.

  1. Strategic. Start at the end point–with what you want to achieve and then work backwards.
  2. Simple. Know what is measured and how it is calculated. If too difficy-ult, re-think.
  3. Owned. A metric has an owner who is held accountable for its outcome.
  4. Actionable. If a metric indicates problems, employees should know what actions to take to improve the measures.
  5. Timely. Metrics must be updated frequently.
  6. Reference-able. Users must understand the metric’s origins.
  7. Accurate. Underlying data needs to be scanned for defects, standardised, deduced and integrated before displaying.
  8. Correlated. Metrics must drive desired outcomes.
  9. Game-proof. Ensure that metrics cannot be circumvented.
  10. Aligned. Metrics are aligned with corporate objectives.
  11. Standardised. All agree on the definitions of metrics.
  12. Relevant. Metrics have a natural life cycle.

Metric definition

Following basic information is needed to define a metric.

Reference
   Each metric needs a unique identifier.
Quality Objectives
  Each metric is designed in such a way that it drives a team towards a certain objective
Indicator
  An indicator or quality characteristic is a reference to the quality model used.
Question
  Each metric answers a direct question.
Metric
  Each metric has a formula on how to calculate or derive the metric.
Description
  How to evaluate a metric should be described in such a way that the metric becomes unambiguous and easier to understand.
Criteria
 Depending on the measures based on the metrics, it should be possible to score a certain quality characteristic. This score can be a percentage, a traffic light (green, amber, red), a score, etc.

by Stefaan Luckermans

Share This:

5 Responses

  1. This is a good general article but a few other points are significant too:

    There are several metrics that violate standard economics and distort reality so much I regard them as professional malpractice:

    1. cost per defect penalizes quality and the whole urban legend about costs going up more than 100 fold is false.
    2. lines of code penalize high level languages and make non coding work invisible

    The most useful metrics for actually showing software economic results are:

    1. function points for normalisation
    2. activity-based costs using at least 10 activities such as requirements, design, coding, inspections, testing, documentation, quality assurance, change control, deployment, and management.

    The most useful quality metric is defect removal efficiency (DRE) or the percentage of bugs found before release, measured against user-reported bugs after 90 days.

    If a development team finds 90 bugs and users report 10 bugs, DRE is 90%. The average is just over 85% but top projects using inspections, static analysis, and formal testing can hit 99%. Agile projects average about 92% DRE.

    Most forms of testing such as unit test and function test are only about 35% efficient and fine one bug out of three. This is why testing alone is not sufficient to achieve high quality.

    Some metrics have ISO standards such as function points and are pretty much the same, although there are way too many variants. Other metrics such as story points are not standardized and vary by over 400%.

    A deeper problem than metrics is the fact that historical data based on design, code, and unit test (DCUT) is only 37% complete. Quality data that only starts with testing is less than 50% complete. Technical debt is only about 17% complete since it leaves out cancelled projects and also the costs of litigation for poor quality.

  2. I’m glad you have pointed out some of the shortcomings and dangers of gathering and using metrics. On the one hand, it is important to think of measurements when you set goals, as in the good old “SMART” goals (Specific, Measurable, Actionable, Realistic and Timely, if I’m remembering correctly). IME people tend to forget about the “measurable” part.

    There is so much abuse of “metrics”, though, that I sometimes think we’d be better of without them. I currently know of two large companies where testers’ performance is measured on how many defects they found, while coders’ performance is measured on how few defects are found in their code. How well do you think programmers and testers work together in those companies?

    The teams I’ve worked on found it useful to look at our biggest obstacles, as identified in retrospectives, and think of experiments to make them smaller. We set goals around those experiments to see how well they worked. For example, my last team worked on a buggy legacy code base, and we wanted to write new features in much better quality software. We set a goal of no more than six high severity bugs in production in the next six months. This kept us focused on using good development and testing practices, and writing the best software we possibly could. Our goal was easy to measure, and it served us well.

    Later on, we had goal to implement performance testing. Since we had none, our first goal was to identify and implement a performance testing tool that was appropriate for us. The next goal was to record a baseline of our application’s performance. Incremental goals, like incremental software, are more realistic and humane.

    Perhaps the bottom line is that teams should be allowed to self-organize, and choose their own metrics which support the goals they want to achieve to delight customers and improve software quality. Having a manager arbitrarily impose metrics often leads to beating team members up with those metrics, which doesn’t help anyone.

    1. Stefaan

      Lisa,

        so true … in the other article on this blog I talk about testing anarchy. One of the things to be aware of is that management uses the honesty of a test team to blame them for remaining defects. That would lead to doubting their authority in the field of testing, which again is one of the characteristics of anarchy.

        Anarchy is waste and will always lead to less efficiency and less effectiveness.

        It is, indeed, better for teams to use their authority to define and measure their metrics. That way everyone can recognise their authority in the field of testing and this way the test team is also enabled to find their own shortcomings. As Elias Canetti said: “People love as self-recognition what they hate as an accusation.”

  3. Tim Koomen

    An interesting and recognizable article, thank you. I have some questions/remarks, maybe you can explain. For me, the distinction between a metric and indicator is a bit unclear.

    You state that a metric differs from an indicator because it tells you why a value is low, as opposed to a KPI that tells you that the cost indicator is negative. I’m not sure I understand or agree. For instance a metric/measurement/indicator (?) on the number of defects found per week shows that the number suddenly goes up. However, this does not tell you why it went up (more testing, worse software quality, …). And in your example, the metric on traceability, when the number is low, is that because you’re at the start of testing, or because new use cases are added all the time, or because test configuration management is poor, or …

    In my experience, each metric or indicator usually is never in itself enough to take counter-measures, you always have to dig deeper and use experience (or additional metrics) = your note 1)

    And on your metrics and quality formulas, can you explain a bit more on:

        1) “Predictability : If M then Q” => M in itself is a metric or measurement, which is not true or false. Or does the formula mean “if M = high then Q=high”? But this seems the same as other formulas
        2) “Reliability : M valid for x% of use of M”, where is the Q in this formula?

    Thank you, Tim

    1. Stefaan

      Hi Tim,

      from your comment I take this:

      • 1. What is the main difference between between metrics and KPI?
      • 2. What if you use a metric/kpi on a timeline?
      • 3. Please explain the formulas in more detail?

      Metric vs. KPI
      Metric is not ambiguous, where a KPI always will need interpretation and explanation.

      • Metric: #covered requirement/ #requirements. 0 is ‘NO TRACE’, 1 is ‘ALL TRACEABLE’. No discussion possible.
      • KPI: #defects found in a week depends on time in the project, methodology, planning, retesting, regression, etc. You will need to investigate and explain why the number is what it is.

      Timeline

      • Timeline of a metric shows the path to improving a certain quality characteristic. E.g. #covered requirement/ #requirements. 0 is ‘NO TRACE’, 1 is ‘ALL TRACEABLE’, no matter where you are in the project.
      • #defects found in a week does not reflect a quality characteristic, but can indicate a problem if interpreted correctly. E.g.is is normal to find more defects in a legacy system. It is normal to find less defects in a COTS. Etc. But as Capers Jones mentions, the DRE = #defects removed during the project/(# defects found during the project + #defects found in the 3 first months in production), would be a better candidate. It tells something about the defect latency of the product under test, which is a quality characteristic. Unfortunately this is a metric after the facts (after 3 months in production).

      However, be aware that metrics/KPI are not a movie, but a snapshot in time. They can give you a false whiteness of the true state of your project. But, metrics on a timeline are less risky than KPIs on a timeline.

      Formulas
      I’ll explain with the formula of traceability.

      • Correlation : linear relation Q/M
        0 is no traceability ….. 1 is full traceability. 0,5 means half of the requirements are traceable. 50% increase in the metric is a 50% increase in the quality characteristic.
      • Consistency : Improving M = improving Q
        No matter when measured in the project, if the metric changes in a positive way and the Quality also changes in a positive way, then a positive change in the metric will always indicate a positive change of the quality.
      • Tracking : Changing M = Changing Q
        It is not possible to have 2 measures for the same amount of the quality characteristic. Changing metrics imply changing quality and vice versa.
      • Predictability : If M then Q
        metrics and quality behave in a consistent way and a certain amount of quality is linked to 1 measure.
      • Discriminative power :High M,Q ≠ Low M,Q
        It is not possible that 0,49 means 0% quality, while 0,51 means 100% quality.
      • Reliability : M valid for x% of use of M
      • traceability: #covered requirement/ #requirements, but if #req=0, then this metric cannot be calculated. In all other cases the metric can be calculated. The metric is valid for 99,99% of the measurements, just not for the first measurement, where the number of requirements is 0. This is easily solved by e.g. constraining the measurements to the ‘after-requirements’ period in the project.

      Tim, I hope this clarifies a bit?

      I do agree with you that numbers need to be handled with caution ….

Leave a Reply