I am a numbers guy and I am terrified by what appears to be the general perception that numbers don’t matter when it comes to an emotional issue or pre-conceived idea. This post explains what I mean by data numeracy and offers examples of the problems I worry about.
The opinions expressed in this post do not reflect the position of any of my previous employers or any other company I have been associated with, these comments are mine alone.
Meteorology
One of my responsibilities over my career was reporting data from meteorological monitoring stations to regulatory agencies primarily concerned with air pollution transport. The first problem is that the monitors had to be located where they measured the wind speed and direction that represented the flow in the area. Ideally the site had to be located in an open field with no nearby obstructions that could affect the wind direction. Once the wind vane was up and running it was not enough to just report all the data collected. There is a vital quality control check to make sure the data are realistic. To do that I developed a program to review the data for oddities. For example, if the wind direction did not vary at all for several hours that period would be flagged for further review. If the temperature was below freezing and there was precipitation at the monitor then I would check the local weather station for freezing rain. If that was observed then it was clearly appropriate to flag the data as missing and note in the data submitted to the regulatory agency that there was freezing rain. The regulatory agency could easily check that decision and in the end, everyone was confident that the data submitted accurately represented the air pollution transport conditions in the area.
Emissions
Another responsibility of mine was to report data from continuous emissions monitoring systems (CEMS) from power plants. Coming from my background it seemed logical that the data should be reviewed in a similar fashion as the meteorological data. The problem is that there are physical relationships between weather parameters that make it much easier to flag problems. Eventually I developed a system to review the data in a reproducible manner basically by looking for outliers and trends in the data. My process flagged data that needed to be checked. It was possible to compare the raw data against operating information and other information to see if the outlying data were just odd or incorrect. The analysis did not say that the data were wrong only that they needed to be reviewed and validated.
In some cases, the numbers were measured correctly but were not representative. For example, during startup and shutdown fuel combustion processes are inefficient and some pollutant levels are high. However, if your concern is the long-term average you don’t want to weigh those short-term values too much because they bias the result. The Environmental Protection Agency uncritically used the CEMS data[1] in a couple of instances and proposed inappropriate limits as a result.
Global Warming
I am irritated by those who make claims that climate change effects are being observed now whenever there is an extreme weather event or a new weather record and have documented instances where the message is incorrect. In the first place, the message is never that there might be good news associated with warming and more CO2 but always it is a sign of imminent, inevitable Armageddon. I could write many posts on examples of this but just want to make a point about temperature trends. Recall that when setting up a meteorological sensor you have to consider whether it will make representative measurements. When measuring temperature trends, a big concern is whether conditions around the sensor are changing and over long periods of time that is difficult. In addition, changes to the observing methods or instruments themselves all affect the trend and have to be considered when evaluating the results. Ultimately measuring temperature trends is not easy and picking and choosing trends has over-hyped the observed global warming. Not considering the data correctly for the task at hand undermines the concept that CO2 is the control knob for climate change.
Conclusion
Data numeracy recognizes that data should be reviewed and irregularities need to be checked. Inconsistent data patterns do not prove that there is a problem only that further review is necessary. If the data are audited in an open and transparent manner then everyone can be confident in the result. Sadly, too many people will not accept numerical results that run counter to their pre-conceived notions and biases.
My personal experiences with data reporting were in regulatory contexts that in the big scheme of things don’t matter much. But I think the data I submitted was unambiguous and believe that my results could withstand scrutiny. On the other hand, the implications of global warming are a big deal because they are being used as the rationale to completely over-haul the entire energy system of New York and the world. Unfortunately, much of the numerical evidence purportedly proving that global warming is occurring is ambiguous and the results do not standup to close scrutiny. My concern is that when I have gone through the process to evaluate data to check a climate change impact and shown that the claim is not supported by the evidence it has not been uncommon that people reject the results.
UPDATE – Revised on January 14, 2021
That brings me to the Election of 2020. From what I have observed there were sufficient irregularities in the presidential election results that an open and transparent audit of the election results was appropriate. For example, a verification analysis similar to ones I have done in the past looked at data with an algorithm looking for instances of unusually large sudden additions of votes in batches much faster than almost all the others, and far above the “normal” pace. “Odd” in this case means absurdly unusual — in Minnesota one dump at 5:30am was a net gain of 113,755 Biden votes at 19 standard deviations from normal or a probability of 1 in 1081. I am aware of one instance of a computer forensics analysis of the Dominion Voting System in one county in one state. Something similar is needed anywhere “odd” data were observed. These issues do not prove anything except that further review is needed. I hope that there were valid reasons for the irregularities but now it appears we will never know.
In my opinion the failure to follow up and determine exactly what was going on with these irregularities was a massive failure and anyone who argues that it was unnecessary doesn’t understand, does not want to understand, or is covering up. The failure to reconcile the data undermines my trust in the process and the system itself.
[1] For example, an arithmetic average of mostly startup data was used to say that facilities were not using their air pollution equipment correctly.