Sunday, 19 January 2020

the history of statistics - NPR, NCR and other registers

We all know counting was important and probably that contributed to the evolution and further work on mathematics but what necessitated the troubled, much-maligned, less-understood, selectively-fancied world of statistics? What brought words such as 'population', 'sample', 'mean', 'normal', 'probability', 'chance' and other more technical 'standard deviation', 'variance', 't-test', 'chi-square' etc. into popular and not so-popular (only the thick, be-spectacled nerdy professors and their students okay!) use.
Thanks to a wonderful book (I don't have statistics to prove it but I loved it!) called 'Statistics - A graphic guide' by Eileen Magnello (author) and Borin Van loon (illustrator), I spent my weekend on, I  had very interesting and unexpected answers to some of the questions above and some other's which capture the headlines of our newspapers. So, here we go.

"The word "statistics" is derived from Latin status further Italian statista - referring to a statista or statesman - someone concerned with matters of the state. Early statistics were quantitative systems for describing matters of state". So, you see, not just the PhD Scholar who has to submit her dissertation, her professor submitting a paper for a double, blind-peer reviewed journal of repute, the economists aspiring for a nobel prize or highly paid data scientists (we all know by now, what that means and how much that pays!), but everyone who is concerned with matters of the state is a statistic and could be interested in statistics. 

The philosophy behind statistics is actually determinism. "Determinism means that there is meaning and order in the universe." Thus there has to be somethings which confirms to a particular thought, size, shape and then there are things that do not or those that 'vary'.  The earliest application of this was in the field of evolutionary biology (Darwin et al) and the concept of species. So, there has to be an ideal type (usually the average or common) which typologists and taxonomists would classify as a particular special of say moths or insects and then any variation (depending on how much) would confirm as a different species. So, good old Darwin was the first to see evolution as a purely statistically process. This is important. Because, in later times, we the humans tend to define common features as build narratives around the same. We look like this so we are 'whites' or 'blacks' or 'brown' or 'yellow'; we all speak the French language so we are French; we all are 'Aryan' races; we are pure; and yes those who do not fit into 'us and our' definition are 'others'. Statisticians only call them as outliers or call them a different species. But how do we deal with the 'others'?

Now, it is natural that counting people or undertaking census was one of the oldest uses and application of statistics. People in Babylon, Egypt and China all collected statistical information about there people. But the purpose is important - to collect taxes and determine number of people/men who could be enlisted in military. The word Census is derived from Roman Censors - people whose duty was to count people. The censors maintained a register of Roman citizens and their property. Scandinavian countries did this is 17th century, US in 1790 for conducting election. Then there were Parish registers. The church has always played a pivotal role in birth and death of people. It was natural a register of the same be maintained and became a part of the duties of the clergy. Yet again, it is useful to note who were included and who were excluded. Those who belonged to the faith were included and those who were not of the same faith or did not practice it and (again importantly) "could not afford to pay the fee for ecclesiastical registration" were summarily excluded. It is almost obvious. The maintenance of any such register is bound to take effort, time and expenses. Who pays of it? Those who are included. Those who are not included due to reasons of being not a part of the average or mean definition (by faith, by birth, by occupation, by language or by nationaliy) did not pay or were not a part of such registers. It also means that conversely, by simply your ability of not being able to pay for maintenance of register (i.e. the very poor) you will not be a part of the register. It should not be too difficult to draw parallels to the current planned exercises in the country and see where is this all heading. Someone said emphatically, 'those who do not take lessons from history, are bound to face it again and again' or something like that.

Some mathematicians, scientists and statistics, wanted to find the total population of nations and world. Again the purpose was noble. They wanted to understand if it was increasing or decreasing or about the same. Malthus, the economist argued that the unchecked human population would always exceed the means of subsistence (food supply) and human improvement will depend on the limits of reproduction as opposed to means of trying to improve food supply. Darwin said the same in other words and implied that since means are limited only the fittest would survive. The fittest has come to mean different things - from being the mightiest, most powerful to most intelligent to most affluent. Thus the science of population or demography became the study of poverty. "The first census in UK, around 1851 included age, sex, occupation and birthplace and counted the blind and deaf". There was more details on death and diseases, and also pointed to appalling sanitary conditions in towns. Overcrowding of towns and impact on sewers or the lack of it and associated health risks are understandable. Thus statistics helped in undertaking some of the first planned sanitary reforms.

Florence Nightangle, the 'lady with the lamp' was another famous user of statistics. She was appalled at the state of record-keeping in military hospitals and war-time casualties. She put together some data around the Crimean Wars and others and presented in beautiful visualisations the number of deaths, overall mortality and reasons to show what all should know very well intuitively - that wars destroy lives. But, as with some of the modern statistics and data, measurement and visualisations may not lead to any action. Wars continued then as they do now. Another beautiful visualisation graph was that by Minard of Napoleon's troops and their ill-fated adventure to Russia in 1812. I have personally used that to teach/train on data visualisations to tell a story. Yet again, the purpose or the outcome was not just depiction of figures but the fact that the futility of wars, the impact on human lives was brilliantly portrayed to tell a story to those that cared to listen.

The modern comparisons of statistics to mini-skirts or bikinis are well known and they too point to the fact that what is the purpose that you are trying to achieve, what is the story you are trying to tell.

It is here that this massive exercise of NPR, NRC, CAA and the ilk fail me and many others. What is this trying to achieve? For who? Who said so? Who asked for it? Why? In whose name?
This exercise is not an announcement by new free or rental plan on Jio, it is not Amazon's sale week, it is not a erection of a statue, or change of a name of a road or city. This is massive, will involve more than a billion people, considerable amount of time, effort and money. Estimates put the expenditure anywhere around 60,000 Cr+; I do not want to comment to timelines and effectiveness - we all know what happened to Aadhaar and demonetisation. There are still people who believe both were great achievements but I am equally entitled to my views that both were bogus, unnecessary, ill-planned and ill-implemented things which did not achieve anything for the common man. There were electoral gains made in UP due to demonetisation and surely some people benefited, not the economy, not the country at large for sure. NPR, NRC and other names that will come up, will be garangutan state exercises which apart from the time, expenses and resources will divert attention of the government, private sector, NGOs and other statistics from things far more important and urgent.

Should the government, policy makers, private sector players, innovators, media people, activists be rather not working on poverty, education, health, social welfare and jobs? Or we also believe like some of our leaders, victrolas and media that 'all is well' and 'achhe din' are here and all these are imaginary problems that only the opposition parties, classes, liberals, nobel laureates, urban naxals, students and biased media can see. Inflation doesn't exist, if we don't eat onions. If we don't publish the right data and hush the messengers, then there is no job crisis (the worst in decades). Farmers do not commit suicides, some or the other party (depending who is in power) is making an issue of it for poll gains. No one killed people on mere suspicion of eating beef - they died because they did not drink cow urine. Gauri Lankesh and Kaluburgi were not killed by anyone, especially not by bigoted, extremists. The attacks on students at campuses and protestors on streets across the country was done by the Pakistan army or ISI, the Indian/Delhi police were merely protecting the law and order. Our education system is in excellent condition because my and your children are speaking English better than english kids (in British accent learnt via youtube), can count Peppa Pigs and are even taking German and French classes. Who goes and cares about children going to government schools. Our health system is world class today - there are cafe's and play areas that would resemble a mall. We don't know what the government hospitals and health centres are upto. Who cares.

There are real problems. There are dire situations that a large number of our fellows brothers and sisters face. The statistas - I repeat those who care about matters of the state, need to know how many and who.
We need statistics for that. There is census. There are records of child birth, at anganwadis, at our schools, at our colleges. We need to know how many are stunted? How many are malnourished?
We need to know how many of our children are dropping out of schools? How many are not learning? How many mothers die at childbirth? How many infants do not make it to age 5? How many are affected by curable diseases?
How many of our graduating students do not have skills to get them jobs? How many people of working age are not working? What are they doing? What do they need to do to get jobs?

More importantly, all of us need to do something about one or few of these questions, challenges and real issues. Only then will things improve. Data, science, mathematics and statistics has mostly been about that. Knowing and knowing with a purpose and then acting upon it. The purpose can only be development and well-being of all.

"Sarve bhavantu sukhinah...sarve santo niramaya". I repeat "Sarve". All.  

No comments:

Post a Comment