- JoNova - https://joannenova.com.au -

Hadley excuse implies their quality control might filter out the freak outliers? Not so.

#datagateThe Met Office, Hadley Centre response to #DataGate implied they do quality control and that leaves the impression that they might filter out the frozen tropical islands and other freak data:

We perform automated quality checks on the ocean data and monthly updates to the land data are subjected to a computer assisted manual quality control process.

I asked John to expand on what Hadley means. He replies that the quality control they do is very minimal, obviously inadequate, and these errors definitely survive the process and get into the HadCRUT4 dataset. Bear in mind a lot of the problems begin with the national meteorological services which supply the shoddy data, but then Hadley seems pretty happy to accept these mistakes. (Hey, it’s not like Life on Earth depends on us understanding our climate. :- ) )

As far as long term trends go, the site-move-adjustments are the real problem and create an artificial warming trend. On the other hand, the frozen tropical islands tells us how competent the “Experts” really are (not a lot) and how much they care about understanding what our climate really was (not at all). That said, we don’t know what effect the freak outliers have on the big trends, but then, neither do the experts.

Below, John drills into those details which show just how pathetically neglected the dataset is. For data-heads — the freak outliers affect the standard deviation and calculation of the normal range. This is a pretty technical issue here “for the record” and to advance the discussion of what Hadley neglected data means. For what it’s worth, McLean can manually copy the process that is documented as the right way to create the HadCRUT4 set, and he can produce the same figures they get. That suggests he knows what he’s doing.     — Jo

__________________________________________

Leaving outliers in the key years means that Hadley won’t filter out real outliers in other years.

Guest Post by John McLean

In its response to the HadCRUT4 data audit the Hadley Centre says that it applies quality control, and implies that obvious errors couldn’t possibly be in the dataset.  In these comments I will show not only that the errors do get carried through but I will explain how this happens.

For each calendar month and each station, the long-term average temperatures, which HadCRUT4 people call normals, are the average of the temperatures for that month from 1961 to 1990.  In a similar fashion, standard deviations are calculated across the period from 1941 to 1990. The Hadley Centre method of quality control uses these two values to set upper and lower limits beyond which data will be assumed to be errors and therefore excluded from the main data processing.

The presence of obvious errors in the HadCRUT4 data is due to a major failure in this approach.

Let’s start with my audit and Apto Uto with its three monthly mean temperatures above 80°C.  Figure 6.3 (pg 43) of the audit is of data extracted from the HadCRUT4 dataset; I didn’t perform any calculations of the values.  The Figure shows three very abnormal values in the grid cell that covers Apto Uto.

The grid cell in question is entirely over land, which means that we can trace each step in the processing as the HadCRUT4 people describe it.  The grid cell value is simply the average temperature anomaly for each station in the cell that reported data.

We start by determining the temperature anomalies for Apto Uto in those months, which is just a matter of subtracting the monthly values from the long-term averages as they appear in the file of data for that station.  The mean monthly temperatures in April, June and July 1978 are 81.5, 83.4 and 83.4 respectively, the long-term average temperatures for those months are 27.8, 27.9 and 28.0 and the anomalies are therefore 53.7, 55.5 and 55.4 degrees.  We repeat the operation for the other stations in the grid cell and then calculate the average anomaly for each month.  The table below shows the anomalies in April and June, the average of those anomalies and the data extracts from both the CRUTEM4 dataset (observation stations only) and the HadCRUT4 dataset (Land and sea).

Temp. Anomalies

ID

Station

Country

Apr-78

Jun-78

800890

Apto Uto Colombia

53.7

55.5

803920

Blonay Colombia

0.3

0.2

800970

Cucuta/Daza A Colombia

-0.5

-0.2

804250

Mene Grande Venezuela

-1.2

-0.7

804380

Merida Venezuela

-0.3

-0.4

804470

San Antonio del Tach Venezuela

-0.1

-0.3

averages of the above

8.65

9.02

Extract from CRUTEM4

8.65

9.02

Extract from HadCRUT4

8.65

9.02

There can be no doubt that the obviously flawed data for Apto Uto has been used in the HadCRUT4 dataset regardless of what the Hadley Centre says.   A similar analysis was undertaken for Golden Rock Airport in St Kitts with its 0°C in December of two years and the CRUTEM4 dataset contained the average anomalies but the data was merged with sea surface temperature data when it came to HadCRUT4 because the grid cell covers both land and sea and the result is less clear.

It’s not that the Hadley Centre has been dishonest about this, it’s that the quality checking it uses has a serious flaw.  It works sometimes but not others, and there’s a good reason for that.

Section 7.6 (pg 54) of the audit shows many examples of outliers that are present in the temperature data from stations.  Wad Medani, for example reports a mean monthly temperature of 99.9°C and Oruro reports 90.0°C.  Other stations report mean temperatures of 0°C when their average temperatures in the same month are at least 8.2°C and as high as 27.4°C.  Table 7-5 lists 25 examples of monthly mean temperatures that are more than 25 standard deviations away from the long-term average for that month.

(At this point you might like to consider what it means that so many errors exist in the data that national meteorological services supply to the people at the CRU for inclusion in the CRUTEM4 and HadCRUT4 datasets.  Should we trust any data at all from these people?)

The grid cell that contains Wad Medani has five other grid cells that reported mean temperatures in that month.  These ranged from 0.5°C to 2.0°C, compared to 99.9°C for Wad Medani.  The average anomaly is 12.38°C if we include Wad Medani and 1.1°C without it.  The HadCRUT4 grid cell value is 1.14°C, so it seems that Wad Medani was correctly rejected by the quality control processing used by the Hadley Centre or the CRU.

So what’s going on?

We can get clues from the number of outliers that the relevant documentation says would be rejected – those more than five standard deviations from the mean – by looking at the number of rejected values each year.  The two years in which more than 100 outliers would be rejected are 2003 (212 outliers) and 2015 (163).

HadCRUT4, outliers, analysis, quality control.

Outliers are defined here as being over 5 standard deviations from “normal”.

The very low number of outliers from 1941 to 1990 is obvious.  This is the period over which standard deviations are calculated (compared to 1961 to 1990 for long-term average temperatures).  Just 26 outliers were discovered for this period, none more than 6.1 standard deviations from the mean

The problem in a nutshell is that the Hadley Centre and/or CRU fail to remove outliers from the data before they calculate the standard deviations.  This can lead to ridiculous values, which when multiplied by five to set the limits above and below the mean become positively bizarre.

The metadata for Apto Uto contains the following line:

Standard deviations =   0.6   0.6   0.5  11.9   0.5  11.8  12.0   0.6   0.5   0.6   0.6   0.7

Five standard deviations for most of those months means no more than 3.5°C but in three of those months they are 59, 59.5 and 60 degrees.  The long-term averages in those months are around 28°C and together that means the temperatures of 81.5, 83.4 and 83.4 are all less than five standard deviations from that mean.

Before calculating the standard deviations from a subset of the data any outliers should have been removed and the process repeated until all the data fell within limits, which probably should have only been the more common three standard deviations anyway.

The people who created the HadCRUT4 dataset are simply incompetent, there is no other word for it. 

____

*Station Site number for San Antonio was a repeat of Apto Uto. It has been corrected, thanks to Jim Ross.  23 Oct 2018

9.7 out of 10 based on 75 ratings