Friday, January 25, 2008

Indo-European Numbers 1-10

I've only just discovered that Eugenio Ramón Luján Martínez, in ‘The Indo-European system of numerals from ‘1’ to ‘10’’, makes detailed proposals on exactly how they came about, and how they were formed (the words' etymology). This is from notes reviewing a paper by him:

The proto-Indo-European 1-10 numerals are:
*oynos/*sem *duwo: *treyes *kwetwores *penkwe *sweks *septm *okto: *newn *dekm

Indo-European ‘6’ ‘may best be explained as a loan from Semitic’, as does ‘7’.
(This is not at all unlikely; the Akkadian 6 and 7 were shishshu and sebe - RP)
- ‘1’ through ‘3’ were deictic in origin
- ‘4’ relates to the four fingers or the width of the palm,
- *okto ‘8’ resolves to a dual marker (-o) and ‘4’
‘best related to Av. ašti ‘width of four fingers, palm’;
- ‘5’ is generally related to ‘fist’ and ‘finger’, but is also related to ‘all’;
- ‘10’ the I-E root underlies *deks- ‘right [hand]’; and
- ‘9’ is generally related to ‘new’.

The proto-Indo-European 1-10 numerals are:
*oynos/*sem *duwo: *treyes *kwetwores *penkwe *sweks *septm *okto: *newn *dekm

M concludes that achieving units for ‘1’ through ‘10’ remains far from demonstrating an original decimal system, as the grouping of ‘1’ through ‘3’ as deictic in origin, ‘4’, 5’, ‘8’, and ‘10’ as involving fingers or hands, and ‘9’ as ‘new’, suggests. Thus, we see can bases for at least two, and possibly four distinct counting systems prior to the development of the decimal system.
From: Notes on: Numeral Types and Changes Worldwide.

Martinez' full doctoral thesis on Indo-European numbers is available online but is entirely in Spanish, and 24MB in size, which I shall endeavour to read some time. It deals with Indo-European numbers from 1 to 100.

This find certainly reinforces my conviction that numerals do not come into existence by immaculate conception, but evolve from very small, simple beginnings set in place many thousands of years ago, perhaps when humans first began to speak and estimate quantities.

1 - 3 are deictic, which means they rely on context. Early on, speakers in many languages made a distinction in pronouns: I (singular), we two (dual), we three (trial) and we (more than 3 - plural), and this also extended to the very low numbers, that used the same roots. Number markers related to these were added to many different kinds of words, not just pronouns and the lower numerals.

The dual still exists in the English distinctions both vs. all, either vs. any, twice vs. x times (an archaic thrice also exists, meaning "three times"), and so on, but the dual and the trial no longer occur in our pronouns.

Those very numbers (in fact 1-4) are also the most easily subitisable; that, is you can estimate the number very quickly by sight, without counting. Most people can estimate number by sight up to 7 or 8, but this takes a bit longer.

You can also, of course, easily subitise 1 hand, 2 hands, 1 foot, 2 feet once you start 'bunching' numbers into groups (mostly based on counting 5 digits, and then making that 1 unit, usually related to 'hand'). A digit, of course, was literally, a finger or toe.

But some number systems rely on just the four fingers, so you get one bunch of 4 fingers, then the next stage is 2 bunches of 4 fingers = 8.
This seems to have happened in proto-Indo-European, or in a counting system that preceded that. (See above: *okto ‘8’ resolves to a dual marker (-o) and ‘4’,
‘best related to Av. ašti ‘width of four fingers, palm’).
9 would then be the start of a new cycle, or if 10 had become a new base, it might be a completely new word (‘9’ is generally related to ‘new’).

This kind of '4,8 cycle' number system occurs in isolated areas in a few Austronesian languages around New Guinea, and in Papuan number sytems as well.
A more 'advanced' system, with a 5,10 cycle, but with 'relicts' of a base 4 system, is more common in Austronesian. In these cases, the '9' is usually constructed something like X1.

This puzzled me for a long time, but the problem begins to clarify itself with the knowledge that proto-Indo-European is confirmed to be probably more of a messy accumulation of different counting systems than the miraculously fully-blown decimal system it appears to be.
Of all Indo-European 1-10 numeral systems, only Vedda has a system that counts 6-9 as 5+1, 5+2, etc. But there are more than 250 of those constructions in Austronesian languages, and in many quite unrelated languages, as well.

For that reason, I believe that the "proto" Austronesian numerals words *enem=6, *pitu=7, *walu=8, and *Siwa=9, appeared latest in the majority of Austronesian of An 1-10 systems.
*sa puluq=1 x puluq*, has nothing to do with hands, but probably appeared before 6-9, because many systems with 5+1, 5+2 constructions use *sa puluq. Furthermore, thei particular word seems to appear quite late in Austronesian languages, suggesting they were borrowed by languages that still preserved older systems in whole or in part.

Monday, January 7, 2008

Numeral Studies in Indo-European

Nineteenth century laws of sound correspondence led to major advances in linguistics. Numeracy, the linguistics of numeral systems, and calculations ... now represent twentieth century contributions to an understanding of the ... decades. Numeral names ... recall an old pre-exponential numeral system that stands between concrete counting and exponential decimal systems.

French Decades.
Seiler has characterized breaks in numeral formations as a "turning point between serializations" that mark the "semiotic status of the base", while Hurford called attention to the point where a language changes methods for signaling addition as indicative of a base break. So the syntax of English 'thir-teen ... nine-teen' (digit + base), in stating the smaller number first, differs from that of 21-29 (base + digit) with the smaller number suffixed to the base. Addition in one but multiplication in the other signals the teens / decades break.

Non-standard decade formations from 30 to 90 in French, trente, quarante, cinquante, soixante, septante, uitante /octante, nonante 'thirty, forty, fifty, sixty, seventy, eighty, ninety', are built on the strategy digit + a ten-valued suffix -(a)nte, parallel to the English forms with digit + '-ty'.

But despite French numerical reforms, standard French numerals for decade counting, like many Celtic systems, retain well-known breaks reminiscent of non-decimal systems. Major breaks in the standard system begin with 70 (soixante-dix, literally '60-10' to soixante-dix-neuf '60-ten-nine' or '60-nineteen') and 80 (quatre-vingt, literally 'four-twenty' to quatre-vingt-dix-neuf 'four-twenty-nineteen').

French soixante-dix and quatre-vingt have been accounted for as the result of Celtic influence. If Celtic, as a branch of IE, has inherited the PIE decimal system, however, both IE Celtic and French should share an inherited decimal system. To the extent that soixante '60' is 6 x 10, and 60 marks a base-like entity on which to build soixante-dix '70' as '60-ten', soixante formations recall a base value '60', but numerals quatre-vingt '80' (four-twenty), quatre-vingt-dix '90' (four-twenty-ten) build on 20

French Decades

Breaks in the standard French decade system reflect factors [10 and 6] operating on base units 10 and 60 as far as 79 and factors [10, 2, and 5] operating on base units 10 and 20 from 80 to 99. These numeral bases and factors are not powers of any base, but pre-exponential factors reminiscent of traditional systems of measure rather than sequential counting. Decade numerals trente to soixante '30-60' are formed regularly from the digits 3-6 plus the decade suffix -(a)nte, and French 62-69 follows the strategy of addition: 'sixty+2 ...' established with 22.

The first break begins with soixante-dix '60-10' which uses 60 as base for adding 10-19 to build 70-79. But soixante itself is otherwise not the productive base that French cent (English 'hundred') is. There is no soixante-vingt, for example. The second break begins with the numeral quatre-vingt that, as '4-20', builds on vingt '20' as a base. In quatre-vingt-dix '4-20-10' the addition process of 60+10 recurs.

Is French vingt part of the paradigm, trente, quarante, ..., or is / was it a separate, unanalyzable base? In the system that underlies quatre-vingt, it serves as a numeral base. By a factor of 5, numeral base vingt is converted to cent '100'. The numeral quatre-vingt (4 vingt's) recalls the conversion of a base 20. Phonological correspondences with Latin make it part of an older decimal paradigm, to the extent that Latin vii-gint-ii '20' is '2-10's'. Sound correspondences relate French vingt to Latin vii-gint-ii 'twenty' or IE *ui-kentii (Coleman 1992:397-398 with discussion of the relation of *kent- to IE 'ten, decade, hundred'), while subsequent decades in -(a)nte correspond to Latin *-(a)-gint-aa: quinqu-a-gint-aa, tri-ginta 'fifty, thirty' (Pope 1966 [1934]:127; 318). Although historically vingt is a phonological reduction from a potential ancestral 'two decades' (Latin vii-gint-ii 'two gint's), whether vi-ngt was only accentually separated from soix-ante or not), vingt and soixante have separate roles in the French system of numeration.

NUMERACY AND THE GERMANIC UPPER DECADES*by Carol F. Justus Journal of Indo-European Studies 24, 1996, 45-80
http://www.utexas.edu/cola/centers/lrc/numerals/cfj-jies/cfj1-section1.html

I tried to contact Carol Justus, Director, Numerals Project at the University of Texas at Austin, to request her advice on my own study. I found that she had passed away on 1 August 2007. So I tried to contact Winfred Lehmann, Director of the Linguistics Research Center, University of Texas at Austin , but found, to my astonishment, that he also died, on the very same day.

Do Eskimos Count Like Austronesians ?

If I came across the following set of numerals amongst my currentchart of some 1400 Austronesian and Papuan numeral systems, I would see nothing much amiss. Their construction, and relation to bodyparts, are fairly typical.
1 - ata'uzik – clearly includes a cognate of Austronesian *isa, POc*sa-kai, etc
2 - ma'dro – ditto of *dusa or *rua
3 - pi'ñasun
4 - si'saman
5 - tûdlemût - ditto of *lima (hand)
6 - atautyimiñ akbinigin tudlimût - "one hand and once on the next "
- bog standard Austronesian/Papuan construction
7 - madro'niñ akbi'nigin - "twice on the next"
8 - piñas'uniñ akbi'nigin - "three times on the next"
9 - kodlinotai'la - "that which has not its ten" - not usual, but not very rare
10 - kodlin - derived from kut or kule, "the upper part" - compare*puluq
14 - akimiaxotaityuña - "I have not fifteen."
15 - akimi'a - fifteen (a separate word)- unusual in An
20 - inyui'na - "a man completed "- bog standard An/Pap construction
25 - inyui'na tûdlimûniñ akbini'digin - "twenty and five times on the next"
30 - inyui'na kodliniñ akbini'digin - " twenty & ten times,"
35 - inyuilna akim'iaminñ aipâliñ" - "twenty & one fifteen times."
40 - madro inyui'na or "madrolipi'a - "two twenties,"
100 - tûdlimûipi'a - five 'pi'a'

These numbers, though, are spoken by Inuits in Point Barrow, in the extreme north of Alaska. Greenland Eskimos use much the same basic number words, but construct their teens and decades differently.
The original writer* points out:"The expressions in Greenlandic and other Eskimo dialects for these higher numbers are very different, which is pretty strong evidence that they have been developed since the separation of the Eskimo into their different branches"
That is exactly what I am finding in my study of Austronesian/Papuan numerals. At each stage in the development of counting systems, certain groups in the mainstream adopted new words for numbers that they had only expressed by gesture previously, or had expressed as separately countable (and visible) 'chunks' like 10s or 20s. They adopted 'consensus' words for 10, 6-9, the teens, decades, and 100s, roughly in that order.
Some groups still lack those 'consensus' words.The 'archaic' lower numbers, from 1-5, 6-9 and 11-20 are still preserved in many languages that haven't yet adopted the 'consensus' Austronesian number lexicon, and they're mappable.
The higher numbers, like the teens, decades, hundreds, and thousands, developed, worldwide, only quite recently, and the times of their diffusions should be dateable (if only relatively, not absolutely).
So the fact that (some) Polynesians have fully developed decimal systems, including standard "An" words for 6-9, while many Melanesians in Vanuatu and New Caledonia haven't, shows that Vanuatu and New Caledonia were first colonised a lot earlier than Polynesia, and in at least 3 separate waves, where newcomers either pushed their predecessors south, or absorbed them.
The Maori had a system based on 20s, not 10s, so that shows they left central Polynesia before the full decimal system diffused into that area.The fact that Easter Island had a full decimal system, while Maoris didn't, shows that Easter Island was settled later than New Zealand.(Or that Maoris kept strictly to their traditions, of course. People will be human, and upset theories like this one).

Update: April 15 2008 - Since I wrote that, I've found that many Polynesian languages had vigesimal systems in use prior to contact with Europeans, so that many of the decimal systems apparent today are not very old at all. I certainly wouldn't repeat again that 'Easter Island was settled later than New Zealand' based solely on my faulty recording of their number-systems.

This dateable number-naming development is still going on. Americans and English (until only the last decade or so) had different meanings for a 'billion' - America - 1000 million, England and Germany a million million. So the division is dateable (around 1600-1800 before America, isolated, developed its own meaning for the word 'billion'), and so is the adoption of the American 'billion' by the English (1990-1995).
It is only since Anglo-Saxon times that the English 'hundred' came to mean 10x10, not a dozen 10s (12x10). 'Beowulf' mentions 100 warriors coming to a place, then 80 of them leaving, and 40 staying.
So the full decimal system we use now only came to England within the last 1000 years or so.It's very possible that 'primitive' Austronesians adopted their identical decimal system before we did.
If this analysis works, it should assist in relative dating of migrations and cross-group influences to a much greater resolution than genetic or linguistic splits and mergers. (Both genetic and linguistic dates are very much estimated on the assumption that things change on a fairly regular and smooth basis. They don't.)
-------------------------------------------------
*Notes on Counting and Measuring among the Eskimo of Point BarrowJohn Murdoch - American Anthropologist, Vol. 3, No. 1. (Jan., 1890),pp. 37-44. http://links.jstor.org/sici?sici=0002-7294%28189001%291%3A3%3A1%3C37%3ANOCAMA%3E2.0.CO%3B2-5

Eskimos do count like Austronesians, but I'm certainly not claiming that they are recently related. The first few number names, and the actual ways of counting up to 1 hand and beyond, and then verbalising that, are pretty similar, worldwide.

Sunday, January 6, 2008

Erromanga - Preservation or Innovation?

I wandered off-topic recently to look at the Erromangan language. (Erromanga is an island about midway between the big islands of Vanuatu and the big island of New Caledonia).

Erromanga once had a least three languages (Sye, Ura, and Utaha) but suffered very heavy depredations in the 19th century by 'black-birders' - recruiters for plantation labour in New Caledonia, and Queensland, Australia. There was a virtual population crash, from an estimated 6000 pre-contact, to only 400 in the 1930s, and about 1300 in 1989. Ura had (in 1989) less than half-a-dozen speakers, all elderly, and Utaha disappeared altogether, about a century ago.

In doing so, I re-read:
The Efate-Erromango problem in Vanuatu subgrouping, John Lynch,
Oceanic Linguistics 43.2 (Dec 2004): p311(28)
Available via JSTOR.

Lynch is a classical comparativist (the expert on Southern Vanuatu) and has 28 pages of grammatics and phonology, to support his theories of grouping/sub-grouping, but precious little about the lexicons of Erromanga, except this, under the heading of 'innovations': -

"(e) POC *sa[??]apuluq, PNCV *sa[??]avulu 'ten" is replaced by PEE *rua-lima ('two-five'): e.g., Lewo lua-lima, South Efate ralim. (The same innovation, however, is found to the immediate north of this subgroup, in Paamese h??lualim.) (9)"
and -
"(b) Erromangan languages share innovation ..., the replacement of *sa[??]apuluq 'ten' by a form composed of 'two' and 'five': cf. Sye narwolem, Ura lurem ~ durem."
ie, the technically more advanced (multiple) word phrase has been 'replaced' by a less-developed construction. This could only make sense to a specialist ruled completely and solely by the limited specialist techniques and jargon of his discipline.

Lynch nearly rescues himself from this, but not quite, by saying, in a sub-note:
It occurred to me that the replacement of a monomorphemic word for "ten" with a transparent bimorphemic one may have been part of a more general simplification of numeral systems, since many SOC languages have quinary systems. However. it turns out that many widely distributed languages that do have compound numerals, based on "five" for 'six' through 'nine' nevertheless retain *sa[??]apuluq 'ten'.

Much more likely, though, is that many languages borrowed a new word, sa-puluq, meaning 1 x (bunch of) 10, before they got around to changing their old constructuons for 6-9

He kept his nose so close to the phonology and grammatics that he apparently ignores some quite amazing (to me, at least) lexical 'innovations':

Shoulder, which has a perfectly good POc 'ancestral' term,*(qa) para, is 'innovated' in Ura as 'nobun-lenge' = head-arm/hand'
Neck ... POc *Ruqa, *liqoR ... (Ura) bo-ri-na Lit. 'X'+ na=breast
Hair ... POc *raun ni qulu ... (Ura) novlingen-nobu- (Sie) novlinompu ... literally feathers/hair-head This is also the literal meaning of the POc construction, but the *POc word has two lines of 'descendants' - one used *raun, alone, and the other used *qulu, alone.
Mouth ... POc *papaq. *qawa ... (Ura) nobun nggivi- = lit. head-tooth
To sleep: ... POc *tiRuR ... (Ura) ahlei-ba = lit. to lie down-ba
Thatch/roof ... POc *qatop ... (Ura) nobun sungai = lit. head-house
To sew: ... POc *saquit ... (Ura) ehli (Sye) ... etri
To stab, pierce ... POc *soka ... (Ura) ehli ... (Sye) satri
Bite ... POc *karat ... (Ura) ahli ... (Sye) elintvi
(This is not wildly exciting, even to an amateur linguist, as sew and stab are very obviously related in POc).

It makes one wonder if these fellows suddenly forgot their 'inherited' vocabulary on an isolated island (hasn't happened elsewhere), or if someone took certain very basic words (body parts, mainly) and deliberately changed them, (ie a genuine invention) or if the people didn't have those 'ancestral proto-Oceanic' words in the first place.

The overall number word/systems differences between Erromanga and Vanuatu languages further north was also completely missed by Lynch in his paper, although he did propose that '2 hands' was an 'innovation' for '1 x name for 10' (leading on, perhaps, to 2x10=20, which it does in this case (Ura - lurem gelu=20, Sye - narwolem duru=20). That suggests that Ura and Sye both adopted the idea of decimal 10s before they adopted the words.

Erromangan languages (I have numeral data for Ura, Sie, and extinct Utaha) don't even achieve the 'consensus' PAn names for 1-4:


1 - *PAn - *esa, *ias .... Ura - sai - OK
2 - *PAn - *dusa ..... Ura - ge-lu - OK

3 - *PAn - *telu ....... Ura - ge-he-li - is very strange, because it (should have) descended directly from its established 'ancestor word', *telu. Instead it appears to be a 'linguistic innovation' based on a 3rd person possessive, 'ga' and directly on a Trial *-(t,s)ali proposed for PSV - proto-Southern Vanuatu.
A similar construction is found in older relict languages in Tanna and New Caledonia ... kesel, kahar, esech, seen, hejen, etc.

4 - *PAn - *Sepat .... Ura - le-me-lu (2-2) (Sie nd-vat).
Lemelu (2-2) must be dubbed a 'linguistic innovation', using classical Comparative Methodology. But it's clearly not inventive and exactly the same construction I've found in number systems that haven't gone much beyond naming numbers up to 5 in other parts of the world. (Example: aula aula=2-2=4, in Binahara, a Trans-New-Guinea language - it even has a very similar root-word).
Naming numbers from 6-9 is very obviously a later invention or borrowing added to the first 3 to 5 number words, so the appearance, suddenly of 'vat' in Ura - sini-vat - 8 is no surprise. It almost proves that the phrases for 6-9 were real inventions in Austronesian languages, at a later date than the first 5 number names were established.

5 - *PAn - *lima .... Ura - su-o-rem (1 hand). This is also a common construction where people mark the 'first hand' and then go on to mark 'hand/hands two' or 'hand-hand' for 10. (Binahara - gena-aulapu = 1-5, gena-aulapu-aulapu = 1-5-5 = 10).

6 - *PAn *enem ... Ura - mi-sai (+1)
7 ...*PAn *pitu ...Ura - sim-he-lu
8 ...*PAn *walu... Ura - sim-he-li
9 ...*PAn *Siwa... Ura - sini-vat
10 * PAn * sa-piluq ... Ura - lu-rem (2-hand)

Lynch's comparative methods of sub-grouping languages enabled him to propose a settlement hypothesis for Vanuatu, as shown (right).


SOc (Southern Oceanic) groups all the languages of Vanuatu.
- that split into Northern Vanuatu and:
NSO (Nuclear Southern Oceanic - all languages south of Northern Vanuatu)
- that (NSO) split into Central Vanuatu and:
- SMel (Southern Melanesian) - all languages south of Epi and Efate islands.
- SMel split into
Southern Vanuatu (*PSV) and New Caledonian

That translates into a family tree that implies that people (who now speak North Vanuatu languages) first settled North Vanuatu, and stayed there, while another lot going south, split in the middle of Vanuatu, with one lot staying put, and another lot going south, and so on.

It implies that the speakers of languages further south would be the ones that settled their territories most recently.

But, to a non-comparativist (like me) it's 'obvious', from the merest of glances at number systems, that the languages in the south are the oldest, and preserve their older constructions. This thinking reverses the implications of the genetic language tree produced by comparativists.

It would mean that the first major split from Southern Oceanic (SOc) would give one branch leading to the surviving New Caledonian group, with the rest continuing to evolve.
- The next split would be between 'the rest' and surviving Southern Melanesian.
- The very latest split would be between 'the rest' and surviving North Vanuatu.

Innovations, Shminnovations (Glossary)

Comparative Method linguists seem to use trade jargon words that are often diametrically opposed to how the rest of us would use those particular terms.

Consider this:

"POC *sa[??]apuluq, PNCV *sa[??]avulu 'ten" is replaced by PEE *rua-lima ('two-five'): e.g., Lewo lua-lima, South Efate ralim. (The same innovation, however, is found to the immediate north of this subgroup, in Paamese h??lualim.) "
The Efate-Erromango problem in Vanuatu subgrouping, John Lynch,
Oceanic Linguistics 43.2 (Dec 2004): p311(28) Available via JSTOR.

Anyone who has ever studied numbering systems, per se, would never describe 2x5 as a replacement for 1x10 (or 1 x group of ten). From 2x5 to 1x10 is, quite definitely, a conceptual step forward. So 2x5 shows the preservation of an older term, not an innovation.

The major problem is that comparative linguists go down the 'Snakes' to reconstruct a wholly imaginary proto-language, then climb up the 'Ladders', look back to their construction, and base their judgement of what exists in current languages on what they themselves invented.

This leads to a few more arse-about-tit linguistic jargon words:

Retention - means a word (or bit of grammar) that apparently descends directly from the imaginary proto-language
Innovation - means a word (or bit of grammar) that apparently doesn't descend from the imaginary proto-language
Reflex, reflected - means a word (or bit of grammar) that apparently corresponds to something in the imaginary proto-language
Conservative - means a language that apparently still preserves words (or bits of grammar) from the imaginary proto-language

In each case, the historical comparative linguist is referring back to (his own) imaginary proto-language, and not, in any way, to what might, just, have preceded that proto-language before it burst, fully-formed, into the world.

Henceforth in these posts, I will try to remember (as when I quote linguists directly) to highlight these linguistic jargon words, so you realise that they often mean exactly the opposite of what you (intuitively) might think they mean.

And I will try to remember to use completely different words myself:

Preservation - means a word (or bit of grammar) that still holds over from an earlier language.
Invention
- means a word (or bit of grammar) that doesn't descend directly from
an earlier language - it's genuinely new.
Descends from -
means a word (or bit of grammar) that does descend directly from
an earlier language.
Preservative
- means a language that apparently still preserves words (or bits of grammar) from an earlier language