Austronesian Numbers Project

Friday, January 25, 2008

Indo-European Numbers 1-10

I've only just discovered that Eugenio Ramón Luján Martínez, in ‘The Indo-European system of numerals from ‘1’ to ‘10’’, makes detailed proposals on exactly how they came about, and how they were formed (the words' etymology). This is from notes reviewing a paper by him:

The proto-Indo-European 1-10 numerals are:
*oynos/*sem *duwo: *treyes *kwetwores *penkwe *sweks *septm *okto: *newn *dekm

Indo-European ‘6’ ‘may best be explained as a loan from Semitic’, as does ‘7’.
(This is not at all unlikely; the Akkadian 6 and 7 were shishshu and sebe - RP)
- ‘1’ through ‘3’ were deictic in origin
- ‘4’ relates to the four fingers or the width of the palm,
- *okto ‘8’ resolves to a dual marker (-o) and ‘4’
‘best related to Av. ašti ‘width of four fingers, palm’;
- ‘5’ is generally related to ‘fist’ and ‘finger’, but is also related to ‘all’;
- ‘10’ the I-E root underlies *deks- ‘right [hand]’; and
- ‘9’ is generally related to ‘new’.

The proto-Indo-European 1-10 numerals are:
*oynos/*sem *duwo: *treyes *kwetwores *penkwe *sweks *septm *okto: *newn *dekm

M concludes that achieving units for ‘1’ through ‘10’ remains far from demonstrating an original decimal system, as the grouping of ‘1’ through ‘3’ as deictic in origin, ‘4’, 5’, ‘8’, and ‘10’ as involving fingers or hands, and ‘9’ as ‘new’, suggests. Thus, we see can bases for at least two, and possibly four distinct counting systems prior to the development of the decimal system.
From: Notes on: Numeral Types and Changes Worldwide.

Martinez' full doctoral thesis on Indo-European numbers is available online but is entirely in Spanish, and 24MB in size, which I shall endeavour to read some time. It deals with Indo-European numbers from 1 to 100.

This find certainly reinforces my conviction that numerals do not come into existence by immaculate conception, but evolve from very small, simple beginnings set in place many thousands of years ago, perhaps when humans first began to speak and estimate quantities.

1 - 3 are deictic, which means they rely on context. Early on, speakers in many languages made a distinction in pronouns: I (singular), we two (dual), we three (trial) and we (more than 3 - plural), and this also extended to the very low numbers, that used the same roots. Number markers related to these were added to many different kinds of words, not just pronouns and the lower numerals.

The dual still exists in the English distinctions both vs. all, either vs. any, twice vs. x times (an archaic thrice also exists, meaning "three times"), and so on, but the dual and the trial no longer occur in our pronouns.

Those very numbers (in fact 1-4) are also the most easily subitisable; that, is you can estimate the number very quickly by sight, without counting. Most people can estimate number by sight up to 7 or 8, but this takes a bit longer.

You can also, of course, easily subitise 1 hand, 2 hands, 1 foot, 2 feet once you start 'bunching' numbers into groups (mostly based on counting 5 digits, and then making that 1 unit, usually related to 'hand'). A digit, of course, was literally, a finger or toe.

But some number systems rely on just the four fingers, so you get one bunch of 4 fingers, then the next stage is 2 bunches of 4 fingers = 8.
This seems to have happened in proto-Indo-European, or in a counting system that preceded that. (See above: *okto ‘8’ resolves to a dual marker (-o) and ‘4’,
‘best related to Av. ašti ‘width of four fingers, palm’).
9 would then be the start of a new cycle, or if 10 had become a new base, it might be a completely new word (‘9’ is generally related to ‘new’).

This kind of '4,8 cycle' number system occurs in isolated areas in a few Austronesian languages around New Guinea, and in Papuan number sytems as well.
A more 'advanced' system, with a 5,10 cycle, but with 'relicts' of a base 4 system, is more common in Austronesian. In these cases, the '9' is usually constructed something like X1.

This puzzled me for a long time, but the problem begins to clarify itself with the knowledge that proto-Indo-European is confirmed to be probably more of a messy accumulation of different counting systems than the miraculously fully-blown decimal system it appears to be.
Of all Indo-European 1-10 numeral systems, only Vedda has a system that counts 6-9 as 5+1, 5+2, etc. But there are more than 250 of those constructions in Austronesian languages, and in many quite unrelated languages, as well.

For that reason, I believe that the "proto" Austronesian numerals words *enem=6, *pitu=7, *walu=8, and *Siwa=9, appeared latest in the majority of Austronesian of An 1-10 systems.
*sa puluq=1 x puluq*, has nothing to do with hands, but probably appeared before 6-9, because many systems with 5+1, 5+2 constructions use *sa puluq. Furthermore, thei particular word seems to appear quite late in Austronesian languages, suggesting they were borrowed by languages that still preserved older systems in whole or in part.

Monday, January 7, 2008

Numeral Studies in Indo-European

Nineteenth century laws of sound correspondence led to major advances in linguistics. Numeracy, the linguistics of numeral systems, and calculations ... now represent twentieth century contributions to an understanding of the ... decades. Numeral names ... recall an old pre-exponential numeral system that stands between concrete counting and exponential decimal systems.

French Decades.
Seiler has characterized breaks in numeral formations as a "turning point between serializations" that mark the "semiotic status of the base", while Hurford called attention to the point where a language changes methods for signaling addition as indicative of a base break. So the syntax of English 'thir-teen ... nine-teen' (digit + base), in stating the smaller number first, differs from that of 21-29 (base + digit) with the smaller number suffixed to the base. Addition in one but multiplication in the other signals the teens / decades break.

Non-standard decade formations from 30 to 90 in French, trente, quarante, cinquante, soixante, septante, uitante /octante, nonante 'thirty, forty, fifty, sixty, seventy, eighty, ninety', are built on the strategy digit + a ten-valued suffix -(a)nte, parallel to the English forms with digit + '-ty'.

But despite French numerical reforms, standard French numerals for decade counting, like many Celtic systems, retain well-known breaks reminiscent of non-decimal systems. Major breaks in the standard system begin with 70 (soixante-dix, literally '60-10' to soixante-dix-neuf '60-ten-nine' or '60-nineteen') and 80 (quatre-vingt, literally 'four-twenty' to quatre-vingt-dix-neuf 'four-twenty-nineteen').

French soixante-dix and quatre-vingt have been accounted for as the result of Celtic influence. If Celtic, as a branch of IE, has inherited the PIE decimal system, however, both IE Celtic and French should share an inherited decimal system. To the extent that soixante '60' is 6 x 10, and 60 marks a base-like entity on which to build soixante-dix '70' as '60-ten', soixante formations recall a base value '60', but numerals quatre-vingt '80' (four-twenty), quatre-vingt-dix '90' (four-twenty-ten) build on 20

French Decades

Breaks in the standard French decade system reflect factors [10 and 6] operating on base units 10 and 60 as far as 79 and factors [10, 2, and 5] operating on base units 10 and 20 from 80 to 99. These numeral bases and factors are not powers of any base, but pre-exponential factors reminiscent of traditional systems of measure rather than sequential counting. Decade numerals trente to soixante '30-60' are formed regularly from the digits 3-6 plus the decade suffix -(a)nte, and French 62-69 follows the strategy of addition: 'sixty+2 ...' established with 22.

The first break begins with soixante-dix '60-10' which uses 60 as base for adding 10-19 to build 70-79. But soixante itself is otherwise not the productive base that French cent (English 'hundred') is. There is no soixante-vingt, for example. The second break begins with the numeral quatre-vingt that, as '4-20', builds on vingt '20' as a base. In quatre-vingt-dix '4-20-10' the addition process of 60+10 recurs.

Is French vingt part of the paradigm, trente, quarante, ..., or is / was it a separate, unanalyzable base? In the system that underlies quatre-vingt, it serves as a numeral base. By a factor of 5, numeral base vingt is converted to cent '100'. The numeral quatre-vingt (4 vingt's) recalls the conversion of a base 20. Phonological correspondences with Latin make it part of an older decimal paradigm, to the extent that Latin vii-gint-ii '20' is '2-10's'. Sound correspondences relate French vingt to Latin vii-gint-ii 'twenty' or IE *ui-kentii (Coleman 1992:397-398 with discussion of the relation of *kent- to IE 'ten, decade, hundred'), while subsequent decades in -(a)nte correspond to Latin *-(a)-gint-aa: quinqu-a-gint-aa, tri-ginta 'fifty, thirty' (Pope 1966 [1934]:127; 318). Although historically vingt is a phonological reduction from a potential ancestral 'two decades' (Latin vii-gint-ii 'two gint's), whether vi-ngt was only accentually separated from soix-ante or not), vingt and soixante have separate roles in the French system of numeration.

NUMERACY AND THE GERMANIC UPPER DECADES *by Carol F. Justus Journal of Indo-European Studies 24, 1996, 45-80
http://www.utexas.edu/cola/centers/lrc/numerals/cfj-jies/cfj1-section1.html
I tried to contact Carol Justus, Director, Numerals Project at the University of Texas at Austin, to request her advice on my own study. I found that she had passed away on 1 August 2007. So I tried to contact Winfred Lehmann, Director of the Linguistics Research Center, University of Texas at Austin , but found, to my astonishment, that he also died, on the very same day.

Do Eskimos Count Like Austronesians ?

If I came across the following set of numerals amongst my currentchart of some 1400 Austronesian and Papuan numeral systems, I would see nothing much amiss. Their construction, and relation to bodyparts, are fairly typical.
1 - ata'uzik – clearly includes a cognate of Austronesian *isa, POc*sa-kai, etc
2 - ma'dro – ditto of *dusa or *rua
3 - pi'ñasun
4 - si'saman
5 - tûdlemût - ditto of *lima (hand)
6 - atautyimiñ akbinigin tudlimût - "one hand and once on the next "
- bog standard Austronesian/Papuan construction
7 - madro'niñ akbi'nigin - "twice on the next"
8 - piñas'uniñ akbi'nigin - "three times on the next"
9 - kodlinotai'la - "that which has not its ten" - not usual, but not very rare
10 - kodlin - derived from kut or kule, "the upper part" - compare*puluq
14 - akimiaxotaityuña - "I have not fifteen."
15 - akimi'a - fifteen (a separate word)- unusual in An
20 - inyui'na - "a man completed "- bog standard An/Pap construction
25 - inyui'na tûdlimûniñ akbini'digin - "twenty and five times on the next"
30 - inyui'na kodliniñ akbini'digin - " twenty & ten times,"
35 - inyuilna akim'iaminñ aipâliñ" - "twenty & one fifteen times."
40 - madro inyui'na or "madrolipi'a - "two twenties,"
100 - tûdlimûipi'a - five 'pi'a'

These numbers, though, are spoken by Inuits in Point Barrow, in the extreme north of Alaska. Greenland Eskimos use much the same basic number words, but construct their teens and decades differently.
The original writer* points out:"The expressions in Greenlandic and other Eskimo dialects for these higher numbers are very different, which is pretty strong evidence that they have been developed since the separation of the Eskimo into their different branches"
That is exactly what I am finding in my study of Austronesian/Papuan numerals. At each stage in the development of counting systems, certain groups in the mainstream adopted new words for numbers that they had only expressed by gesture previously, or had expressed as separately countable (and visible) 'chunks' like 10s or 20s. They adopted 'consensus' words for 10, 6-9, the teens, decades, and 100s, roughly in that order.
Some groups still lack those 'consensus' words.The 'archaic' lower numbers, from 1-5, 6-9 and 11-20 are still preserved in many languages that haven't yet adopted the 'consensus' Austronesian number lexicon, and they're mappable.
The higher numbers, like the teens, decades, hundreds, and thousands, developed, worldwide, only quite recently, and the times of their diffusions should be dateable (if only relatively, not absolutely).
So the fact that (some) Polynesians have fully developed decimal systems, including standard "An" words for 6-9, while many Melanesians in Vanuatu and New Caledonia haven't, shows that Vanuatu and New Caledonia were first colonised a lot earlier than Polynesia, and in at least 3 separate waves, where newcomers either pushed their predecessors south, or absorbed them.
The Maori had a system based on 20s, not 10s, so that shows they left central Polynesia before the full decimal system diffused into that area.The fact that Easter Island had a full decimal system, while Maoris didn't, shows that Easter Island was settled later than New Zealand.(Or that Maoris kept strictly to their traditions, of course. People will be human, and upset theories like this one).

Update: April 15 2008 - Since I wrote that, I've found that many Polynesian languages had vigesimal systems in use prior to contact with Europeans, so that many of the decimal systems apparent today are not very old at all. I certainly wouldn't repeat again that 'Easter Island was settled later than New Zealand' based solely on my faulty recording of their number-systems.

This dateable number-naming development is still going on. Americans and English (until only the last decade or so) had different meanings for a 'billion' - America - 1000 million, England and Germany a million million. So the division is dateable (around 1600-1800 before America, isolated, developed its own meaning for the word 'billion'), and so is the adoption of the American 'billion' by the English (1990-1995).
It is only since Anglo-Saxon times that the English 'hundred' came to mean 10x10, not a dozen 10s (12x10). 'Beowulf' mentions 100 warriors coming to a place, then 80 of them leaving, and 40 staying.
So the full decimal system we use now only came to England within the last 1000 years or so.It's very possible that 'primitive' Austronesians adopted their identical decimal system before we did.
If this analysis works, it should assist in relative dating of migrations and cross-group influences to a much greater resolution than genetic or linguistic splits and mergers. (Both genetic and linguistic dates are very much estimated on the assumption that things change on a fairly regular and smooth basis. They don't.)
-------------------------------------------------
*Notes on Counting and Measuring among the Eskimo of Point BarrowJohn Murdoch - American Anthropologist, Vol. 3, No. 1. (Jan., 1890),pp. 37-44. http://links.jstor.org/sici?sici=0002-7294%28189001%291%3A3%3A1%3C37%3ANOCAMA%3E2.0.CO%3B2-5

Eskimos do count like Austronesians, but I'm certainly not claiming that they are recently related. The first few number names, and the actual ways of counting up to 1 hand and beyond, and then verbalising that, are pretty similar, worldwide.

Sunday, January 6, 2008

Erromanga - Preservation or Innovation?

I wandered off-topic recently to look at the Erromangan language. (Erromanga is an island about midway between the big islands of Vanuatu and the big island of New Caledonia).

Erromanga once had a least three languages (Sye, Ura, and Utaha) but suffered very heavy depredations in the 19th century by 'black-birders' - recruiters for plantation labour in New Caledonia, and Queensland, Australia. There was a virtual population crash, from an estimated 6000 pre-contact, to only 400 in the 1930s, and about 1300 in 1989. Ura had (in 1989) less than half-a-dozen speakers, all elderly, and Utaha disappeared altogether, about a century ago.

In doing so, I re-read:
The Efate-Erromango problem in Vanuatu subgrouping, John Lynch,
Oceanic Linguistics 43.2 (Dec 2004): p311(28)
Available via JSTOR.

Lynch is a classical comparativist (the expert on Southern Vanuatu) and has 28 pages of grammatics and phonology, to support his theories of grouping/sub-grouping, but precious little about the lexicons of Erromanga, except this, under the heading of 'innovations': -

"(e) POC *sa[??]apuluq, PNCV *sa[??]avulu 'ten" is replaced by PEE *rua-lima ('two-five'): e.g., Lewo lua-lima, South Efate ralim. (The same innovation, however, is found to the immediate north of this subgroup, in Paamese h??lualim.) (9)"
and -
"(b) Erromangan languages share innovation ..., the replacement of *sa[??]apuluq 'ten' by a form composed of 'two' and 'five': cf. Sye narwolem, Ura lurem ~ durem."
ie, the technically more advanced (multiple) word phrase has been 'replaced' by a less-developed construction. This could only make sense to a specialist ruled completely and solely by the limited specialist techniques and jargon of his discipline.

Lynch nearly rescues himself from this, but not quite, by saying, in a sub-note:
It occurred to me that the replacement of a monomorphemic word for "ten" with a transparent bimorphemic one may have been part of a more general simplification of numeral systems, since many SOC languages have quinary systems. However. it turns out that many widely distributed languages that do have compound numerals, based on "five" for 'six' through 'nine' nevertheless retain *sa[??]apuluq 'ten'.

Much more likely, though, is that many languages borrowed a new word, sa-puluq, meaning 1 x (bunch of) 10, before they got around to changing their old constructuons for 6-9

He kept his nose so close to the phonology and grammatics that he apparently ignores some quite amazing (to me, at least) lexical 'innovations':

Shoulder, which has a perfectly good POc 'ancestral' term,*(qa) para, is 'innovated' in Ura as 'nobun-lenge' = head-arm/hand'
Neck ... POc *Ruqa, *liqoR ... (Ura) bo-ri-na Lit. 'X'+ na=breast
Hair ... POc *raun ni qulu ... (Ura) novlingen-nobu- (Sie) novlinompu ... literally feathers/hair-head This is also the literal meaning of the POc construction, but the *POc word has two lines of 'descendants' - one used *raun, alone, and the other used *qulu, alone.
Mouth ... POc *papaq. *qawa ... (Ura) nobun nggivi- = lit. head-tooth
To sleep: ... POc *tiRuR ... (Ura) ahlei-ba = lit. to lie down-ba
Thatch/roof ... POc *qatop ... (Ura) nobun sungai = lit. head-house
To sew: ... POc *saquit ... (Ura) ehli (Sye) ... etri
To stab, pierce ... POc *soka ... (Ura) ehli ... (Sye) satri
Bite ... POc *karat ... (Ura) ahli ... (Sye) elintvi
(This is not wildly exciting, even to an amateur linguist, as sew and stab are very obviously related in POc).

It makes one wonder if these fellows suddenly forgot their 'inherited' vocabulary on an isolated island (hasn't happened elsewhere), or if someone took certain very basic words (body parts, mainly) and deliberately changed them, (ie a genuine invention) or if the people didn't have those 'ancestral proto-Oceanic' words in the first place.

The overall number word/systems differences between Erromanga and Vanuatu languages further north was also completely missed by Lynch in his paper, although he did propose that '2 hands' was an 'innovation' for '1 x name for 10' (leading on, perhaps, to 2x10=20, which it does in this case (Ura - lurem gelu=20, Sye - narwolem duru=20). That suggests that Ura and Sye both adopted the idea of decimal 10s before they adopted the words.

Erromangan languages (I have numeral data for Ura, Sie, and extinct Utaha) don't even achieve the 'consensus' PAn names for 1-4:

1 - *PAn - *esa, *ias .... Ura - sai - OK
2 - *PAn - *dusa ..... Ura - ge-lu - OK

3 - *PAn - *telu ....... Ura - ge-he-li - is very strange, because it (should have) descended directly from its established 'ancestor word', *telu. Instead it appears to be a 'linguistic innovation' based on a 3rd person possessive, 'ga' and directly on a Trial *-(t,s)ali proposed for PSV - proto-Southern Vanuatu.
A similar construction is found in older relict languages in Tanna and New Caledonia ... kesel, kahar, esech, seen, hejen, etc.

4 - *PAn - *Sepat .... Ura - le-me-lu (2-2) (Sie nd-vat).
Lemelu (2-2) must be dubbed a 'linguistic innovation', using classical Comparative Methodology. But it's clearly not inventive and exactly the same construction I've found in number systems that haven't gone much beyond naming numbers up to 5 in other parts of the world. (Example: aula aula=2-2=4, in Binahara, a Trans-New-Guinea language - it even has a very similar root-word).
Naming numbers from 6-9 is very obviously a later invention or borrowing added to the first 3 to 5 number words, so the appearance, suddenly of 'vat' in Ura - sini-vat - 8 is no surprise. It almost proves that the phrases for 6-9 were real inventions in Austronesian languages, at a later date than the first 5 number names were established.

5 - *PAn - *lima .... Ura - su-o-rem (1 hand). This is also a common construction where people mark the 'first hand' and then go on to mark 'hand/hands two' or 'hand-hand' for 10. (Binahara - gena-aulapu = 1-5, gena-aulapu-aulapu = 1-5-5 = 10).

6 - *PAn *enem ... Ura - mi-sai (+1)
7 ...*PAn *pitu ...Ura - sim-he-lu
8 ...*PAn *walu... Ura - sim-he-li
9 ...*PAn *Siwa... Ura - sini-vat
10 * PAn * sa-piluq ... Ura - lu-rem (2-hand)

Lynch's comparative methods of sub-grouping languages enabled him to propose a settlement hypothesis for Vanuatu, as shown (right).

SOc (Southern Oceanic) groups all the languages of Vanuatu.
- that split into Northern Vanuatu and:
NSO (Nuclear Southern Oceanic - all languages south of Northern Vanuatu)
- that (NSO) split into Central Vanuatu and:
- SMel (Southern Melanesian) - all languages south of Epi and Efate islands.
- SMel split into
Southern Vanuatu (*PSV) and New Caledonian

That translates into a family tree that implies that people (who now speak North Vanuatu languages) first settled North Vanuatu, and stayed there, while another lot going south, split in the middle of Vanuatu, with one lot staying put, and another lot going south, and so on.

It implies that the speakers of languages further south would be the ones that settled their territories most recently.

But, to a non-comparativist (like me) it's 'obvious', from the merest of glances at number systems, that the languages in the south are the oldest, and preserve their older constructions. This thinking reverses the implications of the genetic language tree produced by comparativists.

It would mean that the first major split from Southern Oceanic (SOc) would give one branch leading to the surviving New Caledonian group, with the rest continuing to evolve.
- The next split would be between 'the rest' and surviving Southern Melanesian.
- The very latest split would be between 'the rest' and surviving North Vanuatu.

Innovations, Shminnovations (Glossary)

Comparative Method linguists seem to use trade jargon words that are often diametrically opposed to how the rest of us would use those particular terms.

Consider this:

"POC *sa[??]apuluq, PNCV *sa[??]avulu 'ten" is replaced by PEE *rua-lima ('two-five'): e.g., Lewo lua-lima, South Efate ralim. (The same innovation, however, is found to the immediate north of this subgroup, in Paamese h??lualim.) "
The Efate-Erromango problem in Vanuatu subgrouping, John Lynch,
Oceanic Linguistics 43.2 (Dec 2004): p311(28) Available via JSTOR.

Anyone who has ever studied numbering systems, per se, would never describe 2x5 as a replacement for 1x10 (or 1 x group of ten). From 2x5 to 1x10 is, quite definitely, a conceptual step forward. So 2x5 shows the preservation of an older term, not an innovation.

The major problem is that comparative linguists go down the 'Snakes' to reconstruct a wholly imaginary proto-language, then climb up the 'Ladders', look back to their construction, and base their judgement of what exists in current languages on what they themselves invented.

This leads to a few more arse-about-tit linguistic jargon words:

Retention - means a word (or bit of grammar) that apparently descends directly from the imaginary proto-language
Innovation - means a word (or bit of grammar) that apparently doesn't descend from the imaginary proto-language
Reflex, reflected - means a word (or bit of grammar) that apparently corresponds to something in the imaginary proto-language
Conservative - means a language that apparently still preserves words (or bits of grammar) from the imaginary proto-language

In each case, the historical comparative linguist is referring back to (his own) imaginary proto-language, and not, in any way, to what might, just, have preceded that proto-language before it burst, fully-formed, into the world.

Henceforth in these posts, I will try to remember (as when I quote linguists directly) to highlight these linguistic jargon words, so you realise that they often mean exactly the opposite of what you (intuitively) might think they mean.

And I will try to remember to use completely different words myself:

Preservation - means a word (or bit of grammar) that still holds over from an earlier language.
Invention - means a word (or bit of grammar) that doesn't descend directly from an earlier language - it's genuinely new.
Descends from - means a word (or bit of grammar) that does descend directly from an earlier language.
Preservative - means a language that apparently still preserves words (or bits of grammar) from an earlier language

Monday, December 10, 2007

Dead Hand of the Comparative Theory - 2 - Out-of-Taiwan?

Arthur Capell, writing nearly half a century ago, but only 20 years after the creation of proto-Austronesian, by Otto Dempwolff, gives some idea of just exactly how 'proto-Austronesian' came about:

Dempwolff (1938) established a vocabulary of some 2,000 words which he regarded as "Original Austronesian" (Uraustronesisch).
- The basis of this restoration is found in [just] two Western IN (Indonesian) languages, Toba-Batak (Sumatra) and Javanese, and one Northern IN language, Tagalog (Philippines).
These, with occasional references to Malagasy, Ngadju (Borneo), and a few other languages, served to establish proto-IN (Indonesian).
- He later added to his list Fijian and Sa'a (Southeast Solomons) and based a "proto-MN (Melanesian)" on the agreements of these with his IN (Indonesian).
- In the third stage, he examined three PN (Polynesian) languages (Tongan, eastern Futuna, and Samoa) and similarly established a proto-PN (Polynesian). In each of the latter two cases he sought to establish phonological innovations on the *AN sound-system, to determine what vocabulary appeared in each of the MN (Melanesian) and PN (Polynesian) areas, again with scant attention to MC (Micronesian).
- Moreover, all his PN (Polynesian) languages belong to the western subgroup of the family, without reference to Tahitian or Maori of the eastern subfamily. He did not seek to establish any original AN (Austronesian) morphology; and very little has yet been done in that sphere.
Arthur Capell - Oceanic Linguistics Today - Current Anthropology, Vol. 3, No. 4. (Oct., 1962), pp. 371-428.
http://links.jstor.org/sici?sici=0011-3204%28196210%293%3A4%3C371%3AOLT%3E2.0.CO%3B2-2

Andrew Pawley and Malcom Ross carry on the story:
During the 1960s and 1970s a more complex theory of AN high order subgroups emerged from work on historical phonology and morphology. The poorly documented Formosan languages, completely left out of Dempwolff's comparisons, became key witnesses in the reconstruction of PAN. Several changes to Dempwolff's proposed PAN sound system have been made in the light of Formosan testimony .

Dyen noted the possibility of a primary split in AN between
(a) some or all Formosan languages and
(b) a group containing all other AN languages on the grounds of phonological mergers common to all the extra-Formosan languages.
Dahl argued forcefully for such a primary split.
Blust named the extra-Formosan branch "Malayo-Polynesian" (MP) and gave a morphological argument supporting it. Although several scholars have expressed strong reservations (Wolff, Dyen),the hypothesis has gained increasing acceptance.

...According to Blust, Harvey, and Reid the Formosan languages may comprise more than one first-order branch of AN, perhaps dividing into Atayalic (northern), Tsouic (central), and Paiwanic (southern) groups.
... Although most Philippine languages seem at least superficially similar to each other, Reid suggests that they have no significant innovations in common. Zorc challenges Reid, arguing that Philippine languages share numerous lexical replacements and that these constitute innovations defining a Philippine subgroup. The problem in the Philippines, as in many other compact regions, is to distinguish innovations from borrowings among related languages that have been in contact for millennia. A recent study of Tiruray (Mindanao) vocabulary shows that this Philippine language has replaced nearly 30% of its basic vocabulary with loans from its neighbors.
Blust has proposed a more detailed family tree. In this tree the
Western MP comprises chiefly the languages of the Philippines, Malaysia, western Indonesia (including Sulawesi) as far east as mid-Sumbawa, and Madagascar and
Central MP comprises approximately the languages of eastern Indonesia east of Sumbawa and Sulawesi excluding Halmahera.
Oceanic remains, but it has been demoted to something like a fourth-order subgroup.[;-(]

...It is probably fair to say that of Blust's proposed subgroups, MP and three of its daughters-Eastern MP, South Halmahera/West New Guinea, and Oceanic-are rather widely accepted because each is based on a significant body of diagnostic innovations. Western MP, Central MP, and Central~Eastern M P, on the other hand, are much more problematic. The difficulties in finding innovations encompassing the entire putative Central MP group very likely reflect the existence of an earlier extensive and longstanding dialect network in the eastern Indonesian region . What one finds is overlapping innovations, each covering part of the region. As 'a whole, Western MP languages seem to inherit only the innovations shared by all MP languages, i.e. those attributable to Proto MP (PMP).

This suggests that there was no Proto Western MP, but rather that PMP diverged into a number of dialects, one of whose descendants became Proto Central Eastern MP. The Western MP languages are simply those MP languages that do not belong to the Central Eastern group. In the same vein, Central MP languages may be just those Central Eastern languages that are not members of Eastern MP.

However, Pawley and Ross do not even mention that Isidore Dyen, a giant in Austronesian language studies, still believes that the Austronesian languages originated around Melanesia, and merely relegate him to a bit-part in the formation of the Out-of-Taiwan paradigm.
Some Evidence Favoring the Central Hypothesis. Isidore Dyen. Yale University (Emeritus).
www.sil.org/asia/philippines/ical/papers/dyen-evidence_central_hypothesis.pdf

The Out-of-Taiwan paradigm rests on remarkably little:

The Malayo-Polynesian (MP) hypothesis (that all extra-Formosan languages belong to a single first-order An subgroup, while the Formosan languages constitute one or more first-order subgroups) rests on the following phonological (and some nonphonological) innovations:
(a) PAn *C and *t merged as PMP *t.
(b) PAn noninitial *L and *n merged as PMP *n.
(c) PAn *S became a glottal spirant in PMP, possibly merging with *h.
The Sound of Proto-Austronesian: An Outsider's View of the Formosan Evidence
Malcolm D. Ross Oceanic Linguistics, Vol. 31, No. 1. (Summer, 1992), pp. 23-64.
http://links.jstor.org/sici?sici=0029-8115%28199222%2931%3A1%3C23%3ATSOPAO%3E2.0.CO%3B2-V

But then Robert Blust, by now the leader of the pack, produced his bombshell:
Blust, R. (1999). "Subgrouping, circularity and extinction: some issues in Austronesian comparative linguistics" in E. Zeitoun & P.J.K Li (Ed.) 'Selected papers from the Eighth International Conference on Austronesian Linguistics' (pp. 31-94). Taipei: Academia Sinica.
showing not only that Formosan languages differed radically from all others in the Austronesian family, but they formed no less than nine separate first-order families, each ranking equal with Proto-Malayo-Polynesian.

This was soon followed by a blitz of publicity:
Taiwan’s gift to the world - Jared M. Diamond

Quote: "A reanalysis of Austronesian languages by Robert Blust strengthens the identification of the first Austronesian waystation, illuminates archaeological findings and the history of boatbuilding, and may help reinterpret the histories of other language families".
- In fact, of the 94 pages of Blust's 1999 paper:
- Only 14 deal with the language classification (within Formosa alone, not of Austronesian languages in general, and concerning phonology only), with some notes about coastal/inland Formosan vocabularies.
- 13 pages are devoted to dismissing competing theories
- 4+ deal with the putative Austronesian Mainland Homeland, (and the embarrassing question of why there's no trace of them). There's also the embarrassing question of why, if the speakers of proto-Malayo Polynesian left Taiwan, they also left no traces behind them.
- 9 deal with boats (although he has some difficulty in proposing a viable method of boat transport from Taiwan out to the Philippines that would be sufficient to set in train a major wave of emigration).

Entrenchment of the Myth:
When a multidisciplinary conference was held in Geneva in June 2004, Peter Bellwood felt able to say, with full confidence:
"As a linguistic category, the Austronesian languages have a history of dispersal from Taiwan through the Philippines into Island Southeast Asia and on to Oceania and Madagascar.

Malcolm Ross explained why the majority of linguists accept Taiwan as the Proto-Austronesian homeland and in what directions the ancestral languages spread and emphasized their transmission through inheritance rather than language shift, implying that the Austronesian language dispersal was associated with an actual movement of Austronesian-speaking people....
Bellwood, Peter & Alicia Sanchez-Mazas (June 2005). "Human Migrations in Continental East Asia and Taiwan: Genetic, Linguistic, and Archaeological Evidence". Current Anthropology 46:3: 480-485
So, it's definite, isn't it?

Well, this little map may be a clue to the 'political' (in the very widest sense) motivation of the Blust-Diamond-Bellwood-everyone else bandwagon, who (mostly) believe the Chinese kicked the Austronesians into Taiwan, and then nudged them gently out to Easter Island and Madagascar:
There may be a hint of the strong current political and nationalist motivation behind the Out-of-Taiwan hypothesis in the accompanying article

Indeed, one wonders whether Taiwan would figure in the Austronesian story at all from the 1950s onwards if Chiang Kai-Shek hadn't retreated there in 1949, and opened up the place to his American allies, and if a 'native' Taiwanese (Li Denghui) hadn't become president of Taiwan in 1990, with an implicit agenda of promoting Taiwanese separation from China.

Dead Hand of the Comparative Method - 1

I may sound like a linguist, and even flatter myself occasionally that I'm learning and aspiring to be one, but I try to keep my head clear, as much as possible, from the complex phonological and grammatical technicalities that seem to be the meat of so many current An linguistics papers:
---------------------------------------------------------------------
- The origin of the Kelabit voiced aspirates: a historical hypothesis revisited.
- On the origin of Philippine vowel grades
- The pronoun system in Galeya: arguments against a clitic analysis.
Oceanic Linguistics Dec 2006
---------------------------------------------------------------------
Of course, these phonological and grammatical details are essential in the recording and study of existing languages, as a whole. They are extremely useful in differentiating between, and grouping, related languages.

But they are simply irrelevant in studying the typology and semantics of number words and number-sets, en masse, as I am attempting to do. (Later on, if I ever get there, I may use phonology to resolve the very ends of the twigs, or, if it's worthwhile, to analyse the mass of Western Malayo-Polynesian number-sets which look, very boringly, almost identical. It is essential to know the inherited sounds of a particular language to be able to help distinguish between inherited words ('reflexes') and borrowed words).

I do feel that the analytical power of phonology is greatly over-estimated in the special field of historical linguistics, and that over-reliance on it can lead to a very misleading results.

A statement like this, from the doyen of An linguists, who kindly sent me the numerals chapter of his forthcoming book:
"Although this may seem like a drastic departure from the decimal system that these languages inherited from a remote common ancestor, even more drastic innovations in numeral systems are found in some AN languages of New Guinea, where they clearly reflect Papuan contact influence."

makes me go quietly ballistic.

It completely reverses the 'normal' hypothesis that a less-developed number system will evolve into a more-developed one. It states that, somehow, all the less-developed and 'irregular' number systems in Austronesian languages (more than half of them) are 'innovations' from a 'pure' An ancestor.

[And, so far, I have not found a single instance where an An number system in New Guinea can be shown to 'clearly reflect Papuan contact influence']. If anything, it seems to have been quite the opposite.

It's easy to see how this happened: the WMP (Western Malayo-Polynesian, once called Indonesian) languages (from which PAn was originally derived) are fairly well homogenised, and all have a full decimal system (with only two minor exceptions). The majority of them use recognisably similar words for those numerals.

- Therefore, so does PAn (proto-Austronesian), the reconstructed ancestor of all Austronesian languages. You sift through those words, just like you'd pan some gravel, end up with some glittering cognates, and reconstruct the familiar 'proto-Austronesian':
*esa/isa, *duSa, *telu, *Sepat, *lima, *enem, *pitu, *walu, *Siwa, *sa-puluq

- Therefore, POc (proto-Oceanic) must also inherit this system, because it also has to demonstrate descendence from PAn.

- Therefore, if you reconstruct a proto-lexicon by sieving through a gradually-reducing mesh of current cognate words you will end up with a 'proto numeral' lexicon that mirrors the majority of current vocabularies, even if that particular lexicon set has been relatively recently introduced, and become very rapidly widespread. If that numeral lexicon also implies a fully-developed number system, you've allowed yourself to be led right down the plughole.

From there on, you must consider anything else as a deviant numerical practice, or innovation (probably brought about by miscegenation with fuzzy-wuzzies).

Therefore, most Melanesians and Polynesians are raving numerical deviants.

The limitations of the Comparative Method are revealed (between the lines) by its greatest current exponent, Robert Blust:

Historical linguistics depends for its results on two fundamental and by now well-tested claims about the nature of language: (1) The relationship between sound and meaning is largely arbitrary, and (2) sound change is largely regular. The first of these claims was first clearly enunciated by Saussure (1959), and the second by various of the Neogrammarians during the last three decades of the nineteenth century. Both have been challenged in various ways, but both remain as pillars of linguistic method.

Like everything in Nature, language changes. In time words come to differ in shape and perhaps also in meaning. Since sound change is regular, the differences in the sound shape of words are systematic, and permit the original forms to be reconstituted with a rather high degree of confidence. The procedures followed in such reconstitution of prehistoric forms are collectively known as the Comparative Method. Where we have documentary checks, as in comparing the modern Romance languages with their immediate common ancestor, Latin, we are encouraged that even in the absence of documentary support our results will not ordinarily go far wrong.
The application of the Comparative Method to related (cognate) words by a process of triangulation results in a reconstruction of the sound system and vocabulary of an earlier language, called a proto-language.

To illustrate with three simple examples, Malay langit, Samoan langi, Hawaiian lani "sky", Malay tangis, Samoan tangi, Hawaiian kani "weep"; and Malay mata, Samoan mata, Hawaiian maka "eye" show recurrent correspondences of sound in words of related meaning, and so are assumed to derive from (reflect) a common ancestral form in each case, conventionally preceded by an asterisk to show that it is based on inference, not on observation.
For our purposes here (leaving out information that can be supplied only by the aboriginal languages of Taiwan), these forms can be reconstructed as *langit "sky", *tangis "weep" and *mata "eye".
Robert Blust
The Prehistory of the Austronesian-Speaking Peoples: A View from Language
Journal of World Prehistory, Vol. 9, No. 4, 1995

Having reconstructed a proto-language, you can then then propose it as the root of a family tree, and trace back the branches to separate the modern, existing languages into their bunches, or groups, each bunch of twigs descended from one node on a major branch.

You end up with a family tree that looks like this:
(Click on the picture for a larger version)

You´ll soon notice a few peculiarities:
1) The family tree is upside down. This is only one of linguistics' weird idosyncrasies, where 'reflect' means derive from, 'innovation' can mean 'reversion', etc.

2) The tree is heavily weighted towards the right, ie towards Oceanic, with a proto-language featured at each node. On the other major branches:

Formosan languages
Western Malayo-Polynesian

There are no proto-languages at the major nodes at all.
Each of those major groups has proved impossible to reduce down to an ancestral proto-language, so far

.
Which is a great pity, since it implies that around the majority of all current Austronesian-speakers speak an orphan language, or at least one whose immediate parental identity and location are in doubt.

But you've got a neat map, showing the distribution of language groups.