Tuesday, May 27, 2008
“Lexomics” - Breaking the language barrier
The trouble with “lexomics” is, as some of the commenters on the Nature article pointed out, is that the language evolution process is Lamarckian, not Darwinian; it’s driven, not followed.
If I was a lithping king, I could make all my thubjects lithp without too much trouble.
The other major problem is that there aren’t any fossils*. All the ancestors are hypothetical proto-languages.
If you take all the most common characteristics of an existing clade (or as many as you can find) and distill them down to the lowest common denominators, you’ll end up with a ‘proto-language’.
But you can’t be at all sure that major characteristics of the original ancestral language have not been entirely lost, or preserved in only a minority of the existing remnant languages. (Which you ignored, just because they were a minority).
Then, to trace the ‘descendents’ from this hypothetical language is absurd.
Even then, though, I hope some of the newer generation of linguists (you, Simon? - please)can use the mechanical/statistical techniques used by geneticists to resolve some major ‘language family tree’ problems, like the star-like pattern of supposed descendents from proto-Austronesian and proto-Oceanic.
*Except where we have surviving scripts. But it was pointed out a long time ago that if the Comparative Method was used, retrospectively, on the Romance languages, the resulting proto-language would NOT be Latin.
Tuesday, April 15, 2008
6 esiwa, 7 ewon, 8 epit, 9 ewata.
The same words were recorded a century ago, as
6 siwa, 7 on, 8 pit, 9 ata,
but only in counting 10s, ie in 60, 70, 80, 90 while the 'lower number' words in Misima for 6-9 were the usual hand-1, hand-2, etc., common in most An groups in that area.
I glanced at them (they're very familiar Austronesian number words),and thought, well that blows my theory that number words were conceived a long time later than the reconstructed proto-Austronesian
*enem, *pitu, *walu, *siwa (in that order).
Perhaps the proto-Austronesians did, after all, have a decimal system, and the more primitive systems in New Guinea, etc, really are `retrograde' systems brought on by Papuan influence, which is the conventional linguistical view.
But the Misima are the only Austronesian group in that area that has these words, so I looked again a bit harder.
And suddenly realised they were using the right words OK, but in the wrong places.
The Misima use 9 for 6, 6 for 7, 7 for 8, and 8 for 9.
Nobody else, anywhere, does that.
It almost proves that these number words were borrowed, not inherited.
And it also almost proves that someone taught them how to make number words for the higher decades, but not how to use them correctly, and they still don't use them properly.
So far, I've listed some 1600 number systems in both Austronesian and Papuan languages, and analysed them as best I can, with a code system that reduces a mass of information to a manageable size.
Maisin 6 = faketi tarosi taure sese which means 'hand over 1' is coded 5\1 because faketi tarosi is 5, the \ stands for a regular 'connector' and sese means 1.
Arifama-Miniafia, another An language close by, has 6 = umat roun ta'imon where 5 = umat roun , so that's coded 51 because there is no 'connector'.
Another dialect of Arifama has 6= uma ti reban taimo nomon, 5 = uma ti morob, and 5 isn't repeated exactly in 6, while uma means hand, so this is coded H\1 (or should have been, but I made a typing error here) .
The coding has made it much easier to visualise connections between number-types in various language sub-groups (and whether they match up or not) and their distribution over larger areas.
This informaion (when I've worked out how to use Photo-Shop) will be transferred to geographical maps, making the picture a whole lot clearer.
Friday, January 25, 2008
The proto-Indo-European 1-10 numerals are:
*oynos/*sem *duwo: *treyes *kwetwores *penkwe *sweks *septm *okto: *newn *dekm
Indo-European ‘6’ ‘may best be explained as a loan from Semitic’, as does ‘7’.
(This is not at all unlikely; the Akkadian 6 and 7 were shishshu and sebe - RP)
- ‘1’ through ‘3’ were deictic in origin
- ‘4’ relates to the four fingers or the width of the palm,
- *okto ‘8’ resolves to a dual marker (-o) and ‘4’
‘best related to Av. ašti ‘width of four fingers, palm’;
- ‘5’ is generally related to ‘fist’ and ‘finger’, but is also related to ‘all’;
- ‘10’ the I-E root underlies *deks- ‘right [hand]’; and
- ‘9’ is generally related to ‘new’.
The proto-Indo-European 1-10 numerals are:
*oynos/*sem *duwo: *treyes *kwetwores *penkwe *sweks *septm *okto: *newn *dekm
M concludes that achieving units for ‘1’ through ‘10’ remains far from demonstrating an original decimal system, as the grouping of ‘1’ through ‘3’ as deictic in origin, ‘4’, 5’, ‘8’, and ‘10’ as involving fingers or hands, and ‘9’ as ‘new’, suggests. Thus, we see can bases for at least two, and possibly four distinct counting systems prior to the development of the decimal system.
From: Notes on: Numeral Types and Changes Worldwide.
Martinez' full doctoral thesis on Indo-European numbers is available online but is entirely in Spanish, and 24MB in size, which I shall endeavour to read some time. It deals with Indo-European numbers from 1 to 100.
This find certainly reinforces my conviction that numerals do not come into existence by immaculate conception, but evolve from very small, simple beginnings set in place many thousands of years ago, perhaps when humans first began to speak and estimate quantities.
1 - 3 are deictic, which means they rely on context. Early on, speakers in many languages made a distinction in pronouns: I (singular), we two (dual), we three (trial) and we (more than 3 - plural), and this also extended to the very low numbers, that used the same roots. Number markers related to these were added to many different kinds of words, not just pronouns and the lower numerals.
The dual still exists in the English distinctions both vs. all, either vs. any, twice vs. x
Those very numbers (in fact 1-4) are also the most easily subitisable; that, is you can estimate the number very quickly by sight, without counting. Most people can estimate number by sight up to 7 or 8, but this takes a bit longer.
You can also, of course, easily subitise 1 hand, 2 hands, 1 foot, 2 feet once you start 'bunching' numbers into groups (mostly based on counting 5 digits, and then making that 1 unit, usually related to 'hand'). A digit, of course, was literally, a finger or toe.
But some number systems rely on just the four fingers, so you get one bunch of 4 fingers, then the next stage is 2 bunches of 4 fingers = 8.
This seems to have happened in proto-Indo-European, or in a counting system that preceded that. (See above: *okto ‘8’ resolves to a dual marker (-o) and ‘4’,
‘best related to Av. ašti ‘width of four fingers, palm’).
9 would then be the start of a new cycle, or if 10 had become a new base, it might be a completely new word (‘9’ is generally related to ‘new’).
This kind of '4,8 cycle' number system occurs in isolated areas in a few Austronesian languages around New Guinea, and in Papuan number sytems as well.
A more 'advanced' system, with a 5,10 cycle, but with 'relicts' of a base 4 system, is more common in Austronesian. In these cases, the '9' is usually constructed something like X1.
This puzzled me for a long time, but the problem begins to clarify itself with the knowledge that proto-Indo-European is confirmed to be probably more of a messy accumulation of different counting systems than the miraculously fully-blown decimal system it appears to be.
Of all Indo-European 1-10 numeral systems, only Vedda has a system that counts 6-9 as 5+1, 5+2, etc. But there are more than 250 of those constructions in Austronesian languages, and in many quite unrelated languages, as well.
For that reason, I believe that the "proto" Austronesian numerals words *enem=6, *pitu=7, *walu=8, and *Siwa=9, appeared latest in the majority of Austronesian of An 1-10 systems.
*sa puluq=1 x puluq*, has nothing to do with hands, but probably appeared before 6-9, because many systems with 5+1, 5+2 constructions use *sa puluq. Furthermore, thei particular word seems to appear quite late in Austronesian languages, suggesting they were borrowed by languages that still preserved older systems in whole or in part.
Monday, January 7, 2008
Seiler has characterized breaks in numeral formations as a "turning point between serializations" that mark the "semiotic status of the base", while Hurford called attention to the point where a language changes methods for signaling addition as indicative of a base break. So the syntax of English 'thir-teen ... nine-teen' (digit + base), in stating the smaller number first, differs from that of 21-29 (base + digit) with the smaller number suffixed to the base. Addition in one but multiplication in the other signals the teens / decades break.
Non-standard decade formations from 30 to 90 in French, trente, quarante, cinquante, soixante, septante, uitante /octante, nonante 'thirty, forty, fifty, sixty, seventy, eighty, ninety', are built on the strategy digit + a ten-valued suffix -(a)nte, parallel to the English forms with digit + '-ty'.
But despite French numerical reforms, standard French numerals for decade counting, like many Celtic systems, retain well-known breaks reminiscent of non-decimal systems. Major breaks in the standard system begin with 70 (soixante-dix, literally '60-10' to soixante-dix-neuf '60-ten-nine' or '60-nineteen') and 80 (quatre-vingt, literally 'four-twenty' to quatre-vingt-dix-neuf 'four-twenty-nineteen').
French soixante-dix and quatre-vingt have been accounted for as the result of Celtic influence. If Celtic, as a branch of IE, has inherited the PIE decimal system, however, both IE Celtic and French should share an inherited decimal system. To the extent that soixante '60' is 6 x 10, and 60 marks a base-like entity on which to build soixante-dix '70' as '60-ten', soixante formations recall a base value '60', but numerals quatre-vingt '80' (four-twenty), quatre-vingt-dix '90' (four-twenty-ten) build on 20
Breaks in the standard French decade system reflect factors [10 and 6] operating on base units 10 and 60 as far as 79 and factors [10, 2, and 5] operating on base units 10 and 20 from 80 to 99. These numeral bases and factors are not powers of any base, but pre-exponential factors reminiscent of traditional systems of measure rather than sequential counting. Decade numerals trente to soixante '30-60' are formed regularly from the digits 3-6 plus the decade suffix -(a)nte, and French 62-69 follows the strategy of addition: 'sixty+2 ...' established with 22.
The first break begins with soixante-dix '60-10' which uses 60 as base for adding 10-19 to build 70-79. But soixante itself is otherwise not the productive base that French cent (English 'hundred') is. There is no soixante-vingt, for example. The second break begins with the numeral quatre-vingt that, as '4-20', builds on vingt '20' as a base. In quatre-vingt-dix '4-20-10' the addition process of 60+10 recurs.
Is French vingt part of the paradigm, trente, quarante, ..., or is / was it a separate, unanalyzable base? In the system that underlies quatre-vingt, it serves as a numeral base. By a factor of 5, numeral base vingt is converted to cent '100'. The numeral quatre-vingt (4 vingt's) recalls the conversion of a base 20. Phonological correspondences with Latin make it part of an older decimal paradigm, to the extent that Latin vii-gint-ii '20' is '2-10's'. Sound correspondences relate French vingt to Latin vii-gint-ii 'twenty' or IE *ui-kentii (Coleman 1992:397-398 with discussion of the relation of *kent- to IE 'ten, decade, hundred'), while subsequent decades in -(a)nte correspond to Latin *-(a)-gint-aa: quinqu-a-gint-aa, tri-ginta 'fifty, thirty' (Pope 1966 :127; 318). Although historically vingt is a phonological reduction from a potential ancestral 'two decades' (Latin vii-gint-ii 'two gint's), whether vi-ngt was only accentually separated from soix-ante or not), vingt and soixante have separate roles in the French system of numeration.
NUMERACY AND THE GERMANIC UPPER DECADES*by Carol F. Justus Journal of Indo-European Studies 24, 1996, 45-80
I tried to contact Carol Justus, Director, Numerals Project at the University of Texas at Austin, to request her advice on my own study. I found that she had passed away on 1 August 2007. So I tried to contact Winfred Lehmann, Director of the Linguistics Research Center, University of Texas at Austin , but found, to my astonishment, that he also died, on the very same day.
1 - ata'uzik – clearly includes a cognate of Austronesian *isa, POc*sa-kai, etc
2 - ma'dro – ditto of *dusa or *rua
3 - pi'ñasun
4 - si'saman
5 - tûdlemût - ditto of *lima (hand)
6 - atautyimiñ akbinigin tudlimût - "one hand and once on the next "
- bog standard Austronesian/Papuan construction
7 - madro'niñ akbi'nigin - "twice on the next"
8 - piñas'uniñ akbi'nigin - "three times on the next"
9 - kodlinotai'la - "that which has not its ten" - not usual, but not very rare
10 - kodlin - derived from kut or kule, "the upper part" - compare*puluq
14 - akimiaxotaityuña - "I have not fifteen."
15 - akimi'a - fifteen (a separate word)- unusual in An
20 - inyui'na - "a man completed "- bog standard An/Pap construction
25 - inyui'na tûdlimûniñ akbini'digin - "twenty and five times on the next"
30 - inyui'na kodliniñ akbini'digin - " twenty & ten times,"
35 - inyuilna akim'iaminñ aipâliñ" - "twenty & one fifteen times."
40 - madro inyui'na or "madrolipi'a - "two twenties,"
100 - tûdlimûipi'a - five 'pi'a'
These numbers, though, are spoken by Inuits in Point Barrow, in the extreme north of Alaska. Greenland Eskimos use much the same basic number words, but construct their teens and decades differently.
The original writer* points out:"The expressions in Greenlandic and other Eskimo dialects for these higher numbers are very different, which is pretty strong evidence that they have been developed since the separation of the Eskimo into their different branches"
That is exactly what I am finding in my study of Austronesian/Papuan numerals. At each stage in the development of counting systems, certain groups in the mainstream adopted new words for numbers that they had only expressed by gesture previously, or had expressed as separately countable (and visible) 'chunks' like 10s or 20s. They adopted 'consensus' words for 10, 6-9, the teens, decades, and 100s, roughly in that order.
Some groups still lack those 'consensus' words.The 'archaic' lower numbers, from 1-5, 6-9 and 11-20 are still preserved in many languages that haven't yet adopted the 'consensus' Austronesian number lexicon, and they're mappable.
The higher numbers, like the teens, decades, hundreds, and thousands, developed, worldwide, only quite recently, and the times of their diffusions should be dateable (if only relatively, not absolutely).
So the fact that (some) Polynesians have fully developed decimal systems, including standard "An" words for 6-9, while many Melanesians in Vanuatu and New Caledonia haven't, shows that Vanuatu and New Caledonia were first colonised a lot earlier than Polynesia, and in at least 3 separate waves, where newcomers either pushed their predecessors south, or absorbed them.
The Maori had a system based on 20s, not 10s, so that shows they left central Polynesia before the full decimal system diffused into that area.The fact that Easter Island had a full decimal system, while Maoris didn't, shows that Easter Island was settled later than New Zealand.(Or that Maoris kept strictly to their traditions, of course. People will be human, and upset theories like this one).
Update: April 15 2008 - Since I wrote that, I've found that many Polynesian languages had vigesimal systems in use prior to contact with Europeans, so that many of the decimal systems apparent today are not very old at all. I certainly wouldn't repeat again that 'Easter Island was settled later than New Zealand' based solely on my faulty recording of their number-systems.
This dateable number-naming development is still going on. Americans and English (until only the last decade or so) had different meanings for a 'billion' - America - 1000 million, England and Germany a million million. So the division is dateable (around 1600-1800 before America, isolated, developed its own meaning for the word 'billion'), and so is the adoption of the American 'billion' by the English (1990-1995).
It is only since Anglo-Saxon times that the English 'hundred' came to mean 10x10, not a dozen 10s (12x10). 'Beowulf' mentions 100 warriors coming to a place, then 80 of them leaving, and 40 staying.
So the full decimal system we use now only came to England within the last 1000 years or so.It's very possible that 'primitive' Austronesians adopted their identical decimal system before we did.
If this analysis works, it should assist in relative dating of migrations and cross-group influences to a much greater resolution than genetic or linguistic splits and mergers. (Both genetic and linguistic dates are very much estimated on the assumption that things change on a fairly regular and smooth basis. They don't.)
*Notes on Counting and Measuring among the Eskimo of Point BarrowJohn Murdoch - American Anthropologist, Vol. 3, No. 1. (Jan., 1890),pp. 37-44. http://links.jstor.org/sici?sici=0002-7294%28189001%291%3A3%3A1%3C37%3ANOCAMA%3E2.0.CO%3B2-5
Eskimos do count like Austronesians, but I'm certainly not claiming that they are recently related. The first few number names, and the actual ways of counting up to 1 hand and beyond, and then verbalising that, are pretty similar, worldwide.
Sunday, January 6, 2008
Erromanga once had a least three languages (Sye, Ura, and Utaha) but suffered very heavy depredations in the 19th century by 'black-birders' - recruiters for plantation labour in New Caledonia, and Queensland, Australia. There was a virtual population crash, from an estimated 6000 pre-contact, to only 400 in the 1930s, and about 1300 in 1989. Ura had (in 1989) less than half-a-dozen speakers, all elderly, and Utaha disappeared altogether, about a century ago.
In doing so, I re-read:
The Efate-Erromango problem in Vanuatu subgrouping, John Lynch,
Oceanic Linguistics 43.2 (Dec 2004): p311(28)
Available via JSTOR.
Lynch is a classical comparativist (the expert on Southern Vanuatu) and has 28 pages of grammatics and phonology, to support his theories of grouping/sub-grouping, but precious little about the lexicons of Erromanga, except this, under the heading of 'innovations': -
"(e) POC *sa[??]apuluq, PNCV *sa[??]avulu 'ten" is replaced by PEE *rua-lima ('two-five'): e.g., Lewo lua-lima, South Efate ralim. (The same innovation, however, is found to the immediate north of this subgroup, in Paamese h??lualim.) (9)"
"(b) Erromangan languages share innovation ..., the replacement of *sa[??]apuluq 'ten' by a form composed of 'two' and 'five': cf. Sye narwolem, Ura lurem ~ durem."
ie, the technically more advanced (multiple) word phrase has been 'replaced' by a less-developed construction. This could only make sense to a specialist ruled completely and solely by the limited specialist techniques and jargon of his discipline.
Lynch nearly rescues himself from this, but not quite, by saying, in a sub-note:
It occurred to me that the replacement of a monomorphemic word for "ten" with a transparent bimorphemic one may have been part of a more general simplification of numeral systems, since many SOC languages have quinary systems. However. it turns out that many widely distributed languages that do have compound numerals, based on "five" for 'six' through 'nine' nevertheless retain *sa[??]apuluq 'ten'.
Much more likely, though, is that many languages borrowed a new word, sa-puluq, meaning 1 x (bunch of) 10, before they got around to changing their old constructuons for 6-9
He kept his nose so close to the phonology and grammatics that he apparently ignores some quite amazing (to me, at least) lexical 'innovations':
Shoulder, which has a perfectly good POc 'ancestral' term,*(qa) para, is 'innovated' in Ura as 'nobun-lenge' = head-arm/hand'
Neck ... POc *Ruqa, *liqoR ... (Ura) bo-ri-na Lit. 'X'+ na=breast
Hair ... POc *raun ni qulu ... (Ura) novlingen-nobu- (Sie) novlinompu ... literally feathers/hair-head This is also the literal meaning of the POc construction, but the *POc word has two lines of 'descendants' - one used *raun, alone, and the other used *qulu, alone.
Mouth ... POc *papaq. *qawa ... (Ura) nobun nggivi- = lit. head-tooth
To sleep: ... POc *tiRuR ... (Ura) ahlei-ba = lit. to lie down-ba
Thatch/roof ... POc *qatop ... (Ura) nobun sungai = lit. head-house
To sew: ... POc *saquit ... (Ura) ehli (Sye) ... etri
To stab, pierce ... POc *soka ... (Ura) ehli ... (Sye) satri
Bite ... POc *karat ... (Ura) ahli ... (Sye) elintvi
(This is not wildly exciting, even to an amateur linguist, as sew and stab are very obviously related in POc).
It makes one wonder if these fellows suddenly forgot their 'inherited' vocabulary on an isolated island (hasn't happened elsewhere), or if someone took certain very basic words (body parts, mainly) and deliberately changed them, (ie a genuine invention) or if the people didn't have those 'ancestral proto-Oceanic' words in the first place.
The overall number word/systems differences between Erromanga and Vanuatu languages further north was also completely missed by Lynch in his paper, although he did propose that '2 hands' was an 'innovation' for '1 x name for 10' (leading on, perhaps, to 2x10=20, which it does in this case (Ura - lurem gelu=20, Sye - narwolem duru=20). That suggests that Ura and Sye both adopted the idea of decimal 10s before they adopted the words.
Erromangan languages (I have numeral data for Ura, Sie, and extinct Utaha) don't even achieve the 'consensus' PAn names for 1-4:
1 - *PAn - *esa, *ias .... Ura - sai - OK
2 - *PAn - *dusa ..... Ura - ge-lu - OK
3 - *PAn - *telu ....... Ura - ge-he-li - is very strange, because it (should have) descended directly from its established 'ancestor word', *telu. Instead it appears to be a 'linguistic innovation' based on a 3rd person possessive, 'ga' and directly on a Trial *-(t,s)ali proposed for PSV - proto-Southern Vanuatu.
A similar construction is found in older relict languages in Tanna and New Caledonia ... kesel, kahar, esech, seen, hejen, etc.
4 - *PAn - *Sepat .... Ura - le-me-lu (2-2) (Sie nd-vat).
Lemelu (2-2) must be dubbed a 'linguistic innovation', using classical Comparative Methodology. But it's clearly not inventive and exactly the same construction I've found in number systems that haven't gone much beyond naming numbers up to 5 in other parts of the world. (Example: aula aula=2-2=4, in Binahara, a Trans-New-Guinea language - it even has a very similar root-word).
Naming numbers from 6-9 is very obviously a later invention or borrowing added to the first 3 to 5 number words, so the appearance, suddenly of 'vat' in Ura - sini-vat - 8 is no surprise. It almost proves that the phrases for 6-9 were real inventions in Austronesian languages, at a later date than the first 5 number names were established.
5 - *PAn - *lima .... Ura - su-o-rem (1 hand). This is also a common construction where people mark the 'first hand' and then go on to mark 'hand/hands two' or 'hand-hand' for 10. (Binahara - gena-aulapu = 1-5, gena-aulapu-aulapu = 1-5-5 = 10).
6 - *PAn *enem ... Ura - mi-sai (+1)
7 ...*PAn *pitu ...Ura - sim-he-lu
8 ...*PAn *walu... Ura - sim-he-li
9 ...*PAn *Siwa... Ura - sini-vat
10 * PAn * sa-piluq ... Ura - lu-rem (2-hand)
Lynch's comparative methods of sub-grouping languages enabled him to propose a settlement hypothesis for Vanuatu, as shown (right).
SOc (Southern Oceanic) groups all the languages of Vanuatu.
- that split into Northern Vanuatu and:
NSO (Nuclear Southern Oceanic - all languages south of Northern Vanuatu)
- that (NSO) split into Central Vanuatu and:
- SMel (Southern Melanesian) - all languages south of Epi and Efate islands.
- SMel split into
Southern Vanuatu (*PSV) and New Caledonian
That translates into a family tree that implies that people (who now speak North Vanuatu languages) first settled North Vanuatu, and stayed there, while another lot going south, split in the middle of Vanuatu, with one lot staying put, and another lot going south, and so on.
It implies that the speakers of languages further south would be the ones that settled their territories most recently.
But, to a non-comparativist (like me) it's 'obvious', from the merest of glances at number systems, that the languages in the south are the oldest, and preserve their older constructions. This thinking reverses the implications of the genetic language tree produced by comparativists.
It would mean that the first major split from Southern Oceanic (SOc) would give one branch leading to the surviving New Caledonian group, with the rest continuing to evolve.
- The next split would be between 'the rest' and surviving Southern Melanesian.
- The very latest split would be between 'the rest' and surviving North Vanuatu.