As an anonymous user, you can only add new data. If you would like to also modify existing data, please create an account and indicate your languages on your user page.

International Linguists Beer Parlour/Inflexions

From OmegaWiki
Jump to: navigation, search

This page is about inflexions (plurals, conjugations, declensions etc.)

An obvious deficiency is the inflexions. The forms "wore", "worn" should be referred to in the article "to wear"! This is even more obvious for conjugations (think of the French verb "aller") and declensions (think of Latin). To my mind, this should be OmegaWiki's priority. It implies of course that there are expressions relations recorded in an expressions table, different from defined meanings relations recorded in a defined meanings table.
This information would be useful for human users and crucial for automatic translators. --Fiable.biz 01:07, 27 August 2009 (UTC)

This is the priority to many of us... I will see if I can find some time to program something in this direction, or if I can find someone to do it ;-) The problem we have with this is that it is language dependent and for each language, grammatically dependent. We need a way to know and indicate which conjugations exists for a verb in a given language, or which inflexions (plural, declensions, ...) for a noun... This is hard to program. --Kipcool 09:09, 2 September 2009 (UTC)
See this enhancement proposal in the bugzilla.--Fiable.biz 15:57, 18 July 2010 (UTC)
Discussion pages
Strategic discussions
International Beer Parlour

- archives


List of Beer Parlours

International Linguists Beer Parlour

- Inflexions


Questions about words
Articles needing attention
Insect room (report bugs)

- archives


Functionality wanted

- archives


Fields for inflexions[edit]

Once OmegaWiki knows, what kind of word the expression in a specific language of a given DM is, it should offer a fixed set of fields to enter inflected forms. Depending on word type and language, the number of these is somewhere between zero and over a dozen. For english nouns, there will be the field "plural" and two checkboxes: "no plural" (Expression:milk) and "plural noun" (Expression:scissors).

The benefit of this is, that this will automaticly create an entry under the plural: If you enter the plural in Expression:dog unter the DM "common four-legged animal...", it will automaticly create Expression:dogs with the entry "Plural of dog (A common four-legged animal...".

The same with verbs: Once you entered the past participle in Expression:be, it will create an entry under Expression:been. --Mkill 11:31, 5 August 2006 (CEST)

A DM "Plural of dog (A common four-legged animal…" is imho nonsens. All inflected forms will link to the same DM+language as their base form, and they have a relation to it which names the inflection, i.e. (with the relation written in parentheses):
  • DM("four-legged…"):eng:dog → (nom.pl.) → DM("four-legged…"):eng:dogs,
  • DM("four-legged…"):eng:dogs → (nom.sg.) → DM("four-legged…"):eng:dog,
I think -- Purodha Blissenbach 01:31, 6 August 2006 (CEST)
Exactly what I suggested! (but the DM's in your example don't need to be separate, it can be one DM with two expressions: singular and plural (other languages with more complicated inflections, like German, will list even more expressions)) --Mkill 11:20, 6 August 2006 (CEST)
I started thinking along these lines a while ago, I have started to formulate it (but far from ready...) at User:Sannab/Inflectional groups. Also, to keep terminology straight, I think it has been stated clearly that all forms of the same word will be linked to the same Definition, but however that the combination of each Expression (here:word form) with that Definition will be considered a separate DM. That is Definitions will be reusable just as Expressions are now.--Sannab 11:34, 6 August 2006 (CEST)
I agree with Mkill's proposal but now we have a seperate entry "DefinedMeaning:been_(6477)" instead. This is nonsens just as the above mentioned example "Plural of dog ...". For other examples like DefinedMeaning:j%C3%A4%C3%A4_(806624) the "definition" is not even translateable because it is not clear which homonym is meant. --User:Ortografix 12:14, 7 November 2010 (UTC)

DefinedMeanings and inflexions information need separate structures[edit]

Ok, I thought about this again and I think I get where the problem is: "inflection groups" (to use Sannabs term) and DM's don't match well.

A DM takes an expression in any language, links a definition to it and translates them to other languages. So far, so good.

Inflections need a different structure: Take one word from one language and list it's forms in different tenses, cases, numbers, whatever applies.

Problem 1: A word word have one inflection but several DM's attached to it. In the case of homonyms, several inflection groups will appear under one lemma, and DM's and inflection groups will need to be tied to each other in some way.

Problem 2: A DM is, at the core, one word in one language, and everything else is a translation or a synonym. It can't be used to store inflection information in those translations and synonyms, because these aren't inherent unchangeable parts of the DM.

So, what it boils down to is, that inflection information and DM's need to be separate structures. A DM is used to store an expression with it's synonyms and translations. An inflection group is used to store a word and it's inflected forms.

These need to be tied together in a way that under a lemma, you would find the following structure:

Example: alarm

  • English
  • alarm (noun)
  • inflections
  1. DM 1
  2. DM 2
  3. ...
  • alarm (verb)
  • inflections
  1. DM 1
  2. DM 2
  3. ...

Example 2: alarmed

  • English
  • past participle of: alarm (verb)

Note that when you fill in the entries for the inflections of alarm (verb), the entry above under "alarmed" would be created by the software.

What needs to be done to create this structure is to change the understanding of what is an "expression" of a DM. As of yet, the expressions are mere strings with a language tag attached. DM's need to be made smarter, they need to know whether the entry "alarm" under English is supposed to be the verb or the noun alarm. --Mkill 16:44, 6 August 2006 (CEST)

A problem that still needs to be addressed under this model are homonyms of the same word type, but with different inflections. In German, there are some of these (die Band vs. das Band, der Modul vs. das Modul). In German, these can be disambiguated by gender. But there could be cases in other languages that are a little more tricky. --Mkill 16:47, 6 August 2006 (CEST)

Yes, e.g. in Swedish "man" (with three different noun senses) all has the same gender but different plural. \Mike 12:17, 12 August 2006 (CEST)
There are several words like this in English: "hang" (verb) and "fish" (noun) come to mind. But they are obviously by far the exception, probably in any language. The solution I see is something like multiple inheritance: inflection groups normally associate with word/part of speech pairs and inherit down to DM but can also be overrided with a local DM value...--Homunq 19:52, 16 November 2006 (CET)
In German, we have yet another case of homonymes, such as "das Wort" (singular) with "die Wörter" (plural) and "die Worte" (plural) - most grammar books do not talk of homonymes in this case, but rather say there was a single word having two plurals.—This unsigned comment was added by Purodha (talkcontribs) .

Entries for grammatical declensions[edit]

Lately I've been seeing quite a few DM's such as DefinedMeaning:superado (1096244). They are simply grammatical declensions of a particular verb and don't define anything. Do these belong on OmegaWiki? If so, what is the benefit of them? – McDutchie 18:22, 5 July 2009 (EDT)

It is planned (if only we had some programmers) to have tables of declensions associated to each words. These tables, however, are language dependent, which makes it complicated to implement.
Thus, I am personnally not in favor of creating entries for declensions, but I also don't see any reason to delete them, since they can be useful, at least until we have declension tables (so I'd say it is a temporary workaround).
The benefit is: in a text, you may find a past participle in a foreign language, and sometimes (for irregular verbs in particular), you are not able to find the infinitive of the verb. So we need a way to search verbs by their declensions. --Kipcool 07:47, 6 July 2009 (EDT)
As stated in the Main page "The initial aim is to provide information on all words of all languages" and a declension is just another word. Its meaning is different from that of the verb (just like nouns defined as for instance conjugaison: "Action de conjuguer, réunir, unir."). If there was a way to relate verbs to its conjugates, I think it should be that, a relation between meanings. --Ascánder 07:23, 7 July 2009 (EDT)
Declension tables are the better solution. Separate entries for each derived form don't make sense and neither the definitions nor the expressions are translateable if the original has several meanings (or translations), example: "Third-person singular indicative present form of the verb 'to stay'." (see also DefinedMeaning_talk:jää (806624). --Ortografix 16:41, 8 July 2009 (EDT)
Should for example Expression:solido include a Spanish entry? As we aim to provide information on all words of all languages the answer seems to be positive; it could be a simpler entry that only says "See soler" (by means of an appropriate declension table). However, most on-line dictionaries I've consulted today include the kind of relation between participles and verbs and most on-line translation dictionaries managed to give me and appropriate translation between participles.
Now for the translation, I have no clues on your specific example as I don't speak German. However I'll try to reproduce it with some examples in English and Spanish (hope not to miss the point). The Spanish verb realizar for instance has two meanings, one that translates in English to realize (or realise) and one that translates to carry out. So realizado has two translations to English realized:the past participle of the verb to realize and carried out:the past participle of the phrasal verb to carry out. The verb Soler has one meaning and translates to two synonyms in English, so solido has two translations and two meanings corresponding to these two translations. An example with three translations could be seen here. --Ascánder 19:21, 8 July 2009 (EDT)
The Spanish definitions for "carried out" and "realized" are identical ("Participio del verbo realizar"). So it is not clear if these verbs have different meanings. They could be also duplicate entries.
Another example the German "Plural von Mutter" can be correctly translated into English "plural of mother" (Spa: madre) or "plural of nut" (Spa: tuerca). Someone who understands English but no German does not know if nut means the fruit, a crazy person or the fastener.
The main point is that the homonyms in different languages are not identical. Using the mentioned definitions for translations may cause semantic drift. --Ortografix 15:36, 9 July 2009 (EDT)
So your objection also applies to the example taken from wordreference, similar to your plural example (though less dramatic): to place has a meaning which can be translated to Spanish as colocar, which has a meaning that can be translated to English as to hang. According to wordnet, Place and hang has no common meanings in English so by defining placed as the past participle of the verb to place, and translating it as colocado, placed could be misunderstood as the p. p. of the verb to hang by someone who doesn't speak English. I agree with you. Nevertheless, the benefits overcome the risks as seems to be the opinion of sources like the Oxford dictionaries used in word reference. --Ascánder 08:31, 10 July 2009 (EDT)
I agree that the declensions are usefull, but not as they are defined now. I propose separate pages like "conjugation:place" for declension tables, possibly in a new namespace "declension". Links to these pages could be added as annotations to the base word (expression), e. g. the infinitive. Language specific templates can be created easily (and a software change is not required). Of course it is necessary to search for all these word forms. --Ortografix 12:17, 10 July 2009 (EDT)
Could you please make a model of your idea? Is it something like having the page Conjugaison française:parler and a link from Expression:parler to that page? The problem I see is that Expression:parlé would neither list the corresponding french meaning nor its translations... --Ascánder 13:19, 10 July 2009 (EDT)
Yes, I had something like that page in mind. If it contained a link to the expression "parler" or its DM the definition and the translations would be very easy to find. The page Expression:parlé could say "See parler" as proposed above; still better would be if the search function would find the conjugation table when someone searches for "parlé". In that case the page Expression:parlé would not be needed. --Ortografix 07:58, 11 July 2009 (EDT)
I completed the example of the p. p. of the verb to place, mentioned above. Once you add the annotation for the corresponding meaning, the ambiguity is solved. i.e. placed translates to colocado according to a given meaning given as a class annotation, to depositado according to a second one and to ubicado according to a third one. On the other direction, colocado can be translated to either placed or hanged according to different meanings. The declension table that you wanted can then be computed from the annotations of the verb.--Ascánder 22:03, 12 July 2009 (EDT)

Agglutinative languages[edit]

I'm going to cure you from trying to record all inflexions. Let's first begin by a few definitions (not every user of OmegaWiki is supposed to be a linguist.):

Definitions[edit]

  • A morpheme is the smallest part of a language carrying a meaning. For instance, in "parliaments", I can see 3 morphemes : "parlia-ment-s", the 1st carrying the meaning of speaking (like "parlour"), the second carrying the meaning of making a noun from a verb (as "commencement") and the third one carrying the meaning of the plural. The fact that I can decompose "parliaments" into morphemes doesn't imply that this decomposition is sufficient to fully understand the full word.
  • A "lexical item" (in French: "lexie") is a word or an expression speakers of a given language memorize. And they memorize it because there is no other sure way to use it correctly. For instance, an English speaker has to memorize "parliament", firstly because there are several morphemes to transform a verb into a noun (such as "-ing", "-tion") so that it could be "parling" or "parlition", secondly because the decomposition into morphemes is not sufficient to get the complete idea of a parliament. Any conversation, being also the activity of speaking, is not a "parliament". But nobody memorise "parliaments", because "s" is the normal morpheme for nouns' plural, and "parliaments" doesn't carry any additional meaning to the mere combination of "parliament" + "s". Lexical items are in all dictionaries. They can be composed of several words. For instance: "Christian name", "high school". Although "Vatican" is a name and is somehow Christian, it's not a "Christian name". A nursery school can be a very high building, it's not a "high school". If fact, there are other reasons to memorize words than vocabulary necessity. Some words are memorises by children because they are grammatical models. A 2 years old child never memorizes the rule "The plural is marked by a final '-s'.", but he memorizes examples and use them as models. Soon, he will have forgotten the models through which he learnt the rule. There are also non linguistic reasons to learn words, such as the way you learnt them, repetition, their significance to you, or the conscious effort you made to remember them. For instance I can't see any linguistic reason to memorize "Mind the gap!", but it's difficult to forget it if you use the tube in London. It's also not for linguistic reasons that you may remember the very words of your husband's declaration of love. But these memorised words usually depend more or less on the speaker, his mother, his family, his personal history. Moreover, they are usually not used by the speaker as bricks of his own language. So they are usually not regarded as "lexical items". Note that the key concept of "lexical item" was not in OmegaWiki up to last month, when I added it!
Congratulations for that! But... the way it is implemented (as the base class) is wrong since the lexical items are the several translations/expressions and not the DM. Right now it says: "this or that DM is a lexical item" but in fact the translations are lexical items and not the DM, which is a concept that is associated with the several lexical items. Or do I misunderstand the matter? --dh 07:35, 24 December 2009 (UTC)
  • A lexical morpheme, also called "derivational suffix" (or prefix or infix), is a morpheme in situation inside a word, morpheme whose addition creates a new lexical item. For instance "-ment" in "parliament". In other words, though the derivational suffix carries meaning per itself, this information is not enough and the combination has to be memorised.
  • A grammatical morpheme, or "inflectional suffix" (or prefix or infix), is a morpheme inside a word (in situation), morpheme whose addition doesn't creates a new lexical item. For instance "-s" in "parliaments". In other words, the meaning carried by the flexional suffix is just added to the meaning carried by the other part, and the combination is not memorised as such. Derivations (the words formed by such a combination) are not in paper dictionaries because it would be useless and expansive.

The meaning of a word got by derivation cannot be completely inferred from its sole structure, while the meaning of a word got by inflexion can be inferred from its structure. Note that a morpheme can be inflectional in one word but derivational in another. For instance, "-s" is usually inflectional, but is derivational in "scissors". Moreover, the limit between derivation and inflexion is not clear-cut. Is "cup of tea" a lexical item or not? In other words, has "cup of tea" to be memorize? If I understand well, OmegaWiki's idea of "expression" is either a lexical item or the translation of a lexical item.
I never learned the word "legged", but I understand very well "legged" because I learned "leg" and "-ed". You like it or not, this means that there is an attributive case in English (plus a genitive: "John's trousers", "Beatrix' trousers"), hence a kind of declension for potentially all nouns. Also note that many languages thought to have no declension have one for a few words. For instance: who/whom/whose, I/me//we/us. In French: qui/que, il/le/lui/soi/se//ils/les/leur/eux/soi/se lequel/auquel/duquel//lesquels/auxquels/desquels.

Too many inflectional forms[edit]

Look at this Mongolian "word": "хамтралжуулагдсанаараа" meaning "in that they were caused to be organised into collective [farms]". Let me cut it into morphemes: хам-т-р-(а)л-ж-уул-(а)гд-сан-аар-аа. After the word root, there are derivational suffixes, forming хамтралж-, which is a lexical item (It's in paper dictionaries.), but the other 5 ones are inflexions. Although хамтралжуулагдсанаараа is a Mongolian word, it's in no dictionary with its inflexions. I never memorised it but I can understand it perfectly, if I know "хамтралжих" (to be collectivized). One of my dictionaries decided to record хамтралжуул- (to collectivize) as a lexical item while my grammar consider the causative "-уул-" as purely inflectional in this case. This example is an extreme case, but Mongolians use much participles, whose particularity is to accept both verbal and noun's grammatical functions and thus inflexions. Cf. "his becoming a Catholic" (the possessive adjective is characteristic of a noun, while the noun complement is characteristic of a verb). In French, on a past participle inflexion, you can add a feminine inflexion and plural inflexion (like to a noun): fiancer → fiancé → fiancée → fiancées. In French, there are about 100 verbal forms, but only two genders and two numbers, usable on the sole past participle, so the number of combinations is acceptable. In Latin, there about 140 verbal forms, plus 2 numbers, 3 genders and 6 cases, applicable to past participle and gerund of active and passive voices , as well as to future participle an future infinitive of active and passive voices. Because on the verbal inflexion of a participle, the gender ending, you can also add the declension ending (which includes the information of the number): fero → latur- → latura → laturas (respectively to carry → carried → "carried" at the feminine form → "carried" at the accusative plural feminine). We're speaking of hundreds of inflectional forms for all Latin verbs, here. In Mongolian, you can add even more, and using 2 ; 3 or 4 inflectional morphemes in a word is everyday language. See the article Agglutination in English Wikipedia, or the equivalent Langue agglutinante in French Wikipedia (with examples better explained). Trying to record all the combinations would be titanic and nearly as stupid as doing a dictionary of all possible sentences.

Since an expression can be composed of several words, and some expressions words can be inflected, recording all expressions' all inflexional forms would be an even huger job. Think for instance of the French lexical item "aller de l'avant" ("to forge ahead") and all its inflexions: "vais de l'avant", "vas de l'avant" etc..

And many inflectional forms have several meanings. For instance, in English, the past participle has both a passive and a past tense meaning ("I've memorized this word.": past, or "The memorized words will then be written down as quickly as possible by the players.": same form but passive meaning). "memorize" may be regarded as one definedMeaning by English speakers, but, from the French point of view, it's at least 5 meanings according to the grammatical person (not to speak of the time and mood), and is translated by different forms (mémorise, mémorises, mémorisons, mémorisez, mémorisent). If each inflexion is to be recorded as a "word", these will be as many definedMeanings. For every non-infected definedMeaning, we would have to record more definedMeanings than the most agglutivative language in the world: the number of intersections of meanings of all languages' grammatical morphemes. For instance, although Mongolian has 11 cases (officialy 7), they will correspond to at least 12 definedMeanings because one is the dative-locative, including clearly 2 meanings, translated differently in other languages. --Fiable.biz 00:46, 15 December 2009 (UTC)

Trying to record all the combinations would be titanic and nearly as stupid as doing a dictionary of all possible sentences. <-- You are right. But a dictionary that tries to define all words in all languages is already titanic and/or stupid to begin with, so why not extend the folly? ;-)
Anyway, I agree because I don't think it's feasible to create DMs for all inflections. A better solution would be to show inflections as a table within the annotation to a word. The inflections should be searchable, so that someone who looks for "aimerions" is taken to "aimer" (or to a list, in case an inflectional form matches several DMs). But I'm not much of an expert on how inflections can be realized technically. --Tosca 18:05, 15 December 2009 (UTC)
If the inflections are rule-based one does not have to explicitly record them but can have an algorithm construct them "on-the-fly". This is possible for example for the conjugation of regular German verbs. Of course there need to be several different algorithms for each language and a way to state if a verb is regular or not and give the word stem and auxiliary verb in case it is, but that shouldn't be the problem. As for the irregular inflections I just think that the possiblity to record them should be implemented, if they are actually all recorded is another question. --18:54, 15 December 2009 (UTC)
The algorithm could kick in the first time someone tries to look at the inflections. The user can then either save the inflections or change them (if they're irregular and the algorithm got it wrong). I think saving the inflections would be easier on the server than recalculating them every time someone wants to look at them. But yeah, those functions are wishful thinking at this point. --Tosca 19:13, 15 December 2009 (UTC)
I don't think it is a good idea to store all different forms of all regular verbs in all languages if it is possible to construct them on the fly. Just look how many different forms there are for the German regular verb 'fragen' and multiply that by the number of all German regular forms and the number of languages with regular verb conjugation. It's a huge amount of data when all you actually need is the word stem 'frag', the auxiliary verb 'haben', the partizip 'gefragt' and the information that it is regular to construct them all. And in regard to cost, though I haven't made any benchmark test, I doubt that constructing them on-the-fly (its actually just a simple substitution) takes that much more cpu-cycles than looking them all up in the database. I weren't even surprised if it actually takes less. And we could just present (that is construct) them if explicitly ask for it (that is if someone clicks on 'inflections' or something), since most of the time one is not interested in the inflections but only the definition. But, storing them in the database probably has its merits as well. --dh 19:28, 15 December 2009 (UTC)
Oh, and in regard to it being wishfull thinking, if someone familiar with the OW PHP code is able and willing to bind in an external program that takes an array containing word stem, auxiliary verb and partizip and returns an array of all conjugations, I'll write a C programm that does exactly that. --dh 19:42, 15 December 2009 (UTC)
For irregular grammatical forms, we have no choice other than storing them as they are and linking them to their base forms (and DMs). For languages having not so many inflected forms per word, we could do that, too, for their regular forms, since it is not expensive. So I believe, we could get a fair amount of the languages done, including Latin, since some 140 entries per verb and 12 per noun and 36 per adjecitve is not too much. Note, by the way, that also base forms have their grammatical properties (such as "+infinitive+imperfective" or "+nominative+singular") which need to be noted. However, I also believe that for some languages, notably polysynthethic ones making wide use of incorporation, recording all possible gramtical forms would be prohibitive, since it might need tens of thousands by thens of thousands of cases per base form. As to the program creating all regular forms for approval or editing: people sufficiently knowledgeable about each language have to create database entries which allow all grammar forms in annotations. When that is done, this language specific data must be used by the program. For the vast majority of (indo-european) languages that I am aware of, that generation process would start with a base form and a part of speech, both of which an editor must provide. --Purodha Blissenbach (talk) 01:33, 29 December 2012 (CET)

Partial and temporary conclusion[edit]

  • Of course, irregular inflexions have to be recorded. And they are indeed lexical items. For instance "bought" is memorised as such and is in paper dictionaries (maybe not as a main entry).
  • The class of a word according to the inflexions it accepts has to be recorded if the word is regular. For instance a French verb's group, or a Latin noun's declension. But it's not required to recorded the list of such inflexions. If it's classical, the grammatical name of the inflexion group could go together with a characteristic inflexion (for Latin and Greek nouns: singular genitive). For instance in Latin "pons, pontis, noun, masculine: imparisyllabic 3rd declension", in French: "aimer, verb, 1st group" and some way to get the conjugation/declension needed in one click.
  • Morphemes which are lexical items have to be recorded. For instance in English "-s" (at least 2 meanings: verbs 3rd person, nouns plural), "-ed" (at least 3 meanings: verbs past participle, verbs passive participle — always identical to the former in English as in French or Spanish — and nouns attributive case).
  • For any compound lexical item, it should be clearly indicated which part can be inflected or not, and referred to the inflexion group of each part. For instance in "aller de l'avant", the verb can be inflected, but not the object ("aller des avants" is not French), while in "to shake hand", both the verb and the object can be inflected.
  • To my mind, there are 3 solutions for 1-word expression's 1st-level inflectional form: they can be recorded, or produced algorithmically on demand, or ignored. This job can be left to an automatic translator, and is not required in a lexical work. If they are there, they should be marked as inflectional forms, clearly distinct from lexical items. By "1st level", I mean "with only one grammatical morpheme".
  • From the 2nd level of inflexions on, I think they shouldn't be recorded. It may be useful to produce them on demand.
  • Later, a word deconstructor could be built, which would recognised хамтралжуулагдсанаараа as a form of хамтралжих and amaturas as a form of amo, amare. But I think there are many other priorities in OmegaWiki.

--Fiable.biz 10:09, 6 September 2009 (UTC)

Gender specific words[edit]

According to the DefinedMeaning page, words which have a different form depending on the gender of the person they refer to, like de:Deutscher and de:Deutsche (en:German) require an extra DM each. Apart from adding hundreds of words/meanings/translations twice being a lot of extra work, it somehow doesn't feel right since the concept, the meaning of the words does not really differ, a doctor is a doctor is a doctor, the gender doesn't really matter and is more a grammatical thing. Wouldn't it make sense and actually be better to have just one DM and just have an extra annotation field to give the female and/or male form of the word? --dh 20:26, 30 October 2009 (UTC)

While I wonder if this post has been overlooked or if people do not have an opinion about the issue or if it is too ridiculous to even deserve a short answer, it still feels wrong to have three different DMs for every concept describing a person. AFAICS, there are only three options:

  • Leave it as it is now, that is, have one DM for languages that do not have different words for female and male persons, and one DM each for female and male forms of words for languages that do have different forms and additionally add each of them to the neutral DM, but marked as non-identical.
  • Have one DM and have an extra option field to denote the male and female forms.
  • Have two DMs, one for the female and one for the male form and add the neutral form of languages that only have one form to both of them and mark it as identical.

Personally I tend to favor the third as it seems that most languages do indeed distinguish between male and female forms, though I am not sure about that.

--dh 08:02, 9 November 2009 (UTC)
I still believe the current situation is the best one when you want to be exact and fair to all languages. In English, the words are not gender specific, and therefore they need a definition which is not gender specific. "ein Deutscher" is not equivalent to "a German" but is equivalent to "a German male", which is what they typically write in newspapers and books when they want to be gender specific.
The case is more tricky in French: the masculine acts both as gender unspecific and gender specific. When you don't know the gender, you use the masculine form. When there is a mix of males and females, you use the masculine as well, whereas in German the two genders are always mentioned explicitly (advertisment in the German U-bahn for example say "we are looking for a Kaufmann/-frau"). --Kipcool 12:27, 9 November 2009 (UTC)
Well, ok then. Though this means a lot of extra work as there are hundreds, if not thousands of words to consider: teacher, doctor, cashier, bus-driver, partner, nurse etc. (Funnily in English a male head-nurse is called "matron", probably because traditionally nurses were always female). I guess the underlying problem is the imperfectness of the languages themselves. I always thought that the German way of dealing with this is not very lucky as it makes things very complicated, like in the example you've mentioned. Though the English way is also not that good as it makes it complicate to explicitly address a certain gender. Well, and French seems to have a problem with that as well, though in a different manner. What would be needed are words that have a female, a male and a neutral form so that it is always very clear what is meant and in case the gender doesn't matter you do not need to mention both the male and the female form but simply use the neutral one. Do you know how artifical languages like Esperanto deal with that? I expect them to address this issue in a sane way if they are well thought out. --dh 23:14, 9 November 2009 (UTC)
So that means that we'd have to make two extra DMs for DefinedMeaning_talk:arrowsmith_(837706) (as well as for all the hundreds of other DMs pointing to a person) as the German translation only points to a man? Right now there are many DMs like this which only have one DM. The one at hand for example has the Spanish male and female translations given without distinguishing explicitly. --dh 00:00, 13 November 2009 (UTC)
And how about (Spanish) adjectives, like the (Spanish) translations for DefinedMeaning:socioeconomic (1109000) for example? The way it is dealt with right now is exactly the way I suggested to deal with (German etc.) nouns. Is this the affirmed standard way to handle this? --dh 15:29, 21 November 2009 (UTC)
No, it is the way the Spanish contributors chose to do, which I think is not exact.
However, I've never raised the point to them, because I don't think this is a so important topic (as long as it is consistent for a given language), and I don't know enough of the Spanish grammar. If you want you can ask Ascander and Galeote about their opinions on this.
Furthermore, for the case of adjectives, feminine forms should be considered as inflections. So it will have to be changed when we have ways of indicating inflections in the software (which will hopefully happen before the end of the universe). My opinion (for French) is that I only add masculine form for now, waiting for the add-inflection-feature. The opinion of the Spanish contributor(s) is to add them now, because it is useful to have them, which is also a good argument, but will require extra work later on. --Kipcool 17:58, 21 November 2009 (UTC)

In many cases, the masculine and feminine of nouns are not less inflexions than the ones of adjectives. All this is just a very small sample of the general inflexion problem. In Mongolian, the French problem mentioned by Kipcool for gender exists for number: the same form is used for neutral number and for singular, while there is a specific plural. So are we going to split all personal nouns according to all languages, pretending than the one Mongolian word "багш" (teacher) has 6 DefinedMeanings: gender-neutral number-neutral, gender-neutral singular, male number-neutral, male singular, female number-neutral, female singular? In fact, as many Mongolian words, the word has a very broad meaning. "багш" is used for school teachers from kindergarten to university and to all kinds of monitors, but also for schools supervisors (a meaning not included in the word teacher) and is the address term for any lama, a bit as "Brother" in the Catholic world. It's also used in the Bible to translate the address term "Master". So you can multiply the 6 by 4 and get 24 DM for one word.
(And, of course, people pretending that inflexions create as many new words will multiply 18 of them by 10 cases and 2 numbers, making 360+6 words. The reflexive ending also applies to 324 of them: 648+36+6=690. But the Mongolian plural corresponds to 2 numbers in classic Greek: plural and dual, so that every plural word should be divided into 2, making 1 035 DMs for "багш" and its inflexions. Etc. This is just a noun. For a verb, it would be more fun. Does OmegaWiki have a big hard disk?)
To my mind, the only case to regard gender-specific words as lexical items is the case where they are indeed lexical items in the linguistic meaning of the term: they are memorised by speakers because there is no rule, a case unfortunately frequent in French because of several ways to form the feminine ("marcheur"→"marcheuse", "facteur"→"factrice", the feminine of "docteur" used to be "doctoresse" and tends to become "docteure", rarely used, though.), because some feminine forms were traditionally used for the wife of men with formerly male-only jobs ("la générale") so cannot so far be used for the feminine, and because of the present national debate and confusion about feminisation of job words. An example in English is friar/none. But even in French the words following a rule (so that their feminine form is just an inflexion) are many more than the other ones. Words ending by "e" have only one form ("ministre", "parlementaire", "vacataire"), "-er"→"-ère" ("caissier", "boulanger", "patissier"), "-t"→"-te" ("enseignant", "perdant", "savant"), "-eur"→"-euse" ("coureur", "employeur", "coiffeur") except in "-cteur"→"-ctrice" ("directeur", "conducteur", "acteur") etc.. Feminine forms to memorise are many exceptions. As a linguistic rule, exceptions concern mainly very frequent words. --Fiable.biz 13:59, 21 December 2009 (UTC)

I must contradict:
  • Also English is gender specific, "Expression:king" and "Expression:queen" and "Expression:man" and "Expression:woman" and many more are clearly gender specific.
  • German does not normally list male and female forms when males and females are meant, that only happens occasionally. Advertisemens lookeing for a "Kaufmann/frau" are seen as ridiculous by quite many (but are an indirect legal requirement) and I know many woman, specifically from East Germany, who are prou to have certificates that they passed their examns as "Kaufmann" (male form)
  • The male/female dichtomy is only one case in a large array of cases not having exact translations between some language groups, so I think it does not make sense to make too much about this particuar case. --Purodha Blissenbach (talk) 01:53, 29 December 2012 (CET)

Producing inflexions[edit]

Bug 24 432 : "Inflection parameters" deals with recording inflection information in OmegaWiki. I'm asked by Niklas Laxström to discuss a way to produce inflexions, which is, to my mind, the last (numbered "6" in my proposition) and non essential step of that process. In order to keep bugzilla for purposes within arm's reach, let's discuss this here.
All the inflexions I know are obtained by adding prefix, suffix, infix, auxiliary verb(s), modifying the radical, using a completely different radical, which is often the case for very frequent verbs produced by merging several older verbs (For instance "go" as etymologically nothing to do with "went".), or are got by a combination of these methods. Let's take as example the English "to go", subjunctive past perfect, 3rd person singular: "(he) would have gone". My idea is that, in most cases, we could have the users define the inflexions of an inflexion group. I think we could have an interface asking of course first the language ("English") and the inflexion group ("to go"), then the parameters ("active voice", "subjunctive", "past perfect", "3rd person", "singular"), depending on the language and the word class (supposed known by OmegaWiki: see above mentioned bug 24 432"). Then, for such a given form of the word, the user will be asked:

1) "What is the form this form is directly based to?" The user chooses by clicking in a table (or tables) containing all the forms (already filled or not) for that word class, plus "radical", plus a choice "This form is not built from any other.". In our example, "would have gone" can be directly produced from "have gone", i.e. active voice, INDICATIVE past perfect, FIRST person singular. In case the user chose a non filled in form, he is asked to first define that form, so that OmegaWiki wouldn't register processes based on undefined processes.

2A) In the case the user chose something else that "This form is not built from any other.", the next question would be: "What steps are needed to produce the desired form from the based form? Please type the letters representing the steps in the right order. A step can be used several times.
modification of the form (M), prefix (P), suffix (S), infix (I), auxiliary verb (A)."
In our example, the user just types "A" and validates.

3A) For each step, the user is then asked for more precisions. In our case, he is asked to choose the auxiliary verb among the English auxiliary verbs (have, be, shall, will, would), then the form of the chosen verb, as in step 1), then were to insert that auxiliary verb. In our case, the user chooses "would", and then among the 3 possibilities: "would have gone", "have would gone", "have gone would", he chooses the first one. Since, in our example, there is only one step from the base form, the process is finished provided that OmegaWiki knows that, in English, auxiliary verbs are always separated by spaces, which is a reasonable assumption. In a language where auxiliary verbs may agglutinate, there would just be a more possibilities, such as "would have gone", "wouldhave gone", "havewould gone", "have would gone", "have wouldgone", "have gonewould" and "have gone would", the hyphen indicating agglutination.
Infixes are of course more difficult but all the infixes I know are in fact a prefix or a suffix added to a previous form. In other words, if one has to add an infix, it means he isn't building his form in the right order and should have added the "infix" in a previous stage, possibly beginning with a base form simpler than the one he began with. It's likely there exist real infixes in languages I don't know, in which case I hope that, in a give inflexion group, the infix can be added at the right place by searching a pattern in the word, from the beginning or from the end of the word, and then replacing it by another pattern (containing the infix), or by counting letters or syllables from the beginning or the end of the word, then adding there the infix.
Radical modifications are also difficult to deal with. But I also hope that, in a given inflexion group, the modification can be performed at the right place by searching a pattern in the word, from the beginning or from the end of the word, and then replacing it by another pattern, or by counting letters or syllables from the beginning or the end of the word, then adding or deleting a fix number of letters. For instance, to get "overcame" from "overcome", in the English verb group "come", one looks for the pattern "come" from the end, then replace it by "came".

2B) In the case, at the first question, the user answered "This form is not built from any other.", it means that his word has more than one radical. Then the user is asked "Your word has more than one radical and the form you want to enter uses a not yet registered radical. Can all the forms of this word based on this radical be constructed from the form you're going to enter?". If the users answers "yes", then the form to enter (question 4B) will be recorded by OmegaWiki as another radical.

3B) If the user answered "no" to question 2B), then he is asked to enter the new radical as such (question 4B). The worst case I know if French verb "aller", using 3 radicals: "all-" (→ infinitive "aller", past simple "alla" etc.), "v-" (→ present "vais") and "ir-" (future "irais").

4B) In both cases (the users answered "yes" or "no" at question 2B), the users has to explain how to find the root to change. This problem is at this point similar to the one or radical modification: finding and replacing a pattern. For instance, in the English inflexion group "to go", the past simple is got from the simple present, by searching from the end the pattern "go" and replacing it by "went". For instance "overgo" → "overwent".

Some languages traditionally don't use the notion of inflection "groups", or use it in combination with the concept of "rule". For instance a Mongolian noun inflects according to 2 things: the presence or not of the so-called "secret n", and the main vowel of the noun (4 are possible). This makes 2 combining rules. Translating them into the "group" paradigm would, in this case, lead to 2 × 4 = 8 groups, which would be very manageable but disconcerting for a Mongolian. In other languages, the combination of rules, or the combination of rules and groups could lead to many inflection groups. For instance, in Malagasy, they are several phonetic rules modifying the first or the last letter(s) of the radical when a prefix or a suffix is added, and these rules, not perceived as making "groups", combine with conjugation groups (ruling suffix and prefix to add). The combination of the two would lead to quite many inflection groups. In the two examples just mentioned (Mongolian and Malagasy) as well as in ancient Greek, there is a (possibly complex) phonetic rule regarding the radical modification in presence of prefix or suffix, and another rule ruling the prefix or the suffix to add. If there is no more complicated case than this (And I can't think of any other for the moment.), then the process I've just described can be modified like this: in some languages, any given word is able to belong to 2 groups: a "radical type group" (indicating how the radical will be modified) and a "suffix, infix, auxiliary verb" group. Then, when adding a suffix or an infix according to the 2nd group membership, the process will modify the radical according to the 1st group membership and to the affix to add.
I don't pretend my analysis is complete but I think it's sufficient to prove that, in most cases, it's feasible to ask the users to describe how their language words inflects. Of course, there should be an option: "It seems the inflexion process cannot be properly described by any of the ways you propose. I want to contact the programmer.".
As I pointed out in bug 24 432, the urgent matter for me is not to produce all words inflections, but to record properly their inflection group and to provide a couple of fully inflected examples for each group (Steps 1 to 5 of the "bug"). Once that implemented, we will understand much better the situation and be able to make a better analysis to produce all the inflections of all words if we want to.
--Fiable.biz 16:11, 25 July 2010 (UTC)

Needed notions[edit]

For words, we use, so far, 2 notions: expression and definedMeaning.

For inflexions, we need 3: morpheme, named form, and definedMeaning. For instance: in "overtaken", "-n" is a morpheme, expressing the form named "past participle", which has at least 2 definedMeanings: "achieved", as in "has overtaken" or passive, as in "will be overtaken". The morpheme "-t" of "learnt" expresses the same form "past participle". Moreover, the morpheme should be accompanied by the rules to mount it to the word to inflect or, sometimes, to replace it, as the morpheme "people" replaces the morpheme "person" to express the namedForm "plural".

For words, we also need 3 notions instead of two: expression, word and definedMeaning. The word is the possibly inflected unit, itself regarded as a "word", i.e. a lexical item, whose meaning is modified or lost in an expression. For instance the expression "go and see" has 3 words: "go", "and" and "see", two of which being inflexionable. The definedMeaning is "Move somewhere in order to meet or perceive by one's eyes.". The expression "chair" has only one word: "chair". Since many expressions have only one word, we can go on with 1 table, but each inflectionable word of a compound expression should be linked with the corresponding expression, and a boolean should say if this word is infexionable or not. For instance, in "rain cats and dogs", "rain" is inflexionable, but the other three words are not. --Fiable.biz 08:30, 4 September 2011 (UTC)

It has been a while since you wrote the above, but because I am new here, I want to express my support for your suggestions.
I think your suggestion regarding inflections is currently doable with the available fields. For example Expression="-n", Syntrans (eng)="-n", DM (eng)= "suffix to convert a verb to passive voice", SynTrans:Annotations:grammatical property = "past participle".
Your second suggestion regarding words would require the creation of a new table in the database for words within multi-word expressions that are inflectable. This will not be a very big table because it will only contain the words of expressions that containing an inflectable word and at least one additional word. This table would be looked up by the system only if a boolean indicates that the expression has this characrteristic. For generic cases, application logic can use a general rule without having to consult this table (for example, in English, in verb + preposition combinations, the verb is inflected exactly like when the verb is alone, as in throw -> threw and throw away -> threw away). Of course Kip has a much better perspective on this matter, I would be interested in his views.
Just a few thought to restart this discussion. --InfoCan 17:27, 15 March 2012 (CET)

Words to inflections and vice versa[edit]

As I was recently playing with writing a conjugation table generator for Turkish, it occurred to me that storing or generating conjugation tables may be unnecessary in general. For English for example, as long as you know the four forms of each verb (e.g., go-went-gone-going) and the grammar rules, you can generate all conjugations. So only the four forms need to be stored as annotations. Also, perhaps a link to a page explaining the conjugation rules in English would be appropriate. One could write a conjugation script but it would be just a nice thing to have, not essential.

Similarly for Turkish, if you also know the 3rd person singular forms of just three tenses (past, aorist and present continuous; for example for the verb root et-, these are etti, eder and ediyor), you can generate all other conjugated forms automatically. Similar to English, annotating these three verb forms for each Turkish verb is a compact form to store all conjugation information. An additional link from all Turkish verbs to a page explaining how other conjugations can be generated may be useful.

However, I realize that for French this type of solution will not work for irregular verbs. There, you may simply need a link from each SynTrans to a generic conjugation table of the appropriate verb group, or to a specific conjugation table if it is an irregular verb.

It seems more useful to program would be reverse conjugation or more generally an inflection parser scripts. You would give the program an inflected form of a word, it would tell you what the root is and what type(s) of inflection there is.

  • For example, for English:
cleanable:
1) clean(v.) + [-able](ability suffix).
  • An example in French:
mangerons:
1) manger](v)(Futur simple, 1pp)
  • An example for Turkish:
karın
1) [karın](= belly; n.)
2) [karı](= wife; n.) + [n](posessive-2ps)
3) [kar](= snow; n.) + [ın](genitive)
4) [kar](= snow; n.) + [(ı)n](posessive-2ps)

I think I could write a Perl program that can parse Turkish for all possible suffix combinations in an input word that are consistent with the root (non-suffix) part. The output would be displayed only if the root is already in Omegawiki. After getting the parsed form of the input word, the user can then click on the root part to get its definition.

If OmegaWiki had conjugation and reverse conjugation scripts for each language, it would be possible to translate inflected forms of words:

  • fre:[mangerons]
→ fre:[ manger ](v.)(Futur simple, 1pp)
→ eng:[ eat ](v.)(Futur simple, 1pp)
→ eng:[(we) will eat]

So, my vision of how inflections should be handled in Omegawiki is to write 1) inflection generators, 2) inflection parsers and 3) mapping tables between parsed elements. --InfoCan 19:05, 31 March 2012 (CEST)

How to record words which are inflectional in one language and lexical in another?[edit]

See Don't verbs and corresponding nouns of action express a same meaning? --Fiable.biz 06:33, 22 August 2012 (CEST)

Inflections and Language History[edit]

Inflections change over time, and they do so rather quickly sometimes. A recent example of language change in German is the elision of an "-e" in some German dative declensions: Konrad Adenauer (1876–1967) and Joseph Beuys (1921–1986) have been recorded saying "nach dem Kriege" (after the war) occasionally in the 1970's, while it is common today to say "nach dem Krieg" without "-e". Even the majority of Beuys' generation does so today. So we have two declension forms in parallel for some dozen years, with one having decreasing and one having increasing likelyhood of use. --Purodha Blissenbach (talk) 20:28, 29 December 2012 (CET)

Going the same way Apertium does?[edit]

This is another point I gave up on years ago ...

Apertium is a machine translation tool and it's simply great (IMHO). When a new language for automatic translation is added they do need exactly the same information we need for inflexions etc. here. They then "treat" the words (well sentences, because they also do deal with sentence building) to get correctly inflected forms. We do need the same here. My really big question now is: why should we re-invent the wheel asking people who work for less resourced languages to do double work? If I build stuff for East Franconian, it's only me building it and there is no time to care about three, four, five projects.
No matter if you talk about a spell checker, a translation dictionary, a monolingual dictionary, language lessons, CAT-Tools, machine translation: they all start with words ... and OmegaWiki is at the very beginning of that line. The idea of OmegaWiki (WiktionaryZ) at the beginning was to minimize work, to allow people to do things only once and not on a number of Wiktionariey, to save time. For less resourced languages doing work once using it in several places is even more relevant. So why shouldn't we look at other projects and see how we can be the basis for many more usages of the same work? Why not see if we can apply the Apertium system for inflections or at least build it in such a way that data can easily be converted?
Btw. Apertium already uses OmegaWiki data - so eventually work done on it can be re-imported? My 2 cts ... :-) --Sabine (talk) 11:50, 9 October 2013 (CEST)
You also might want to have a look at this: https://svn.code.sf.net/p/apertium/svn/incubator/ - just take a language pair you know well enough and have a look how it is organized. --Sabine (talk) 12:27, 9 October 2013 (CEST)
Awesome!! I'll have a look at it.
I thought they had only an inflexion debuilder (lemmatisator?) but I didn't know about an inflexion generator.
Anyway, the latest thoughts on the subject is that we need a generator with rules and not just a lot of fields to fill in manually (or with a bot). So it goes in the direction of what you mention :) --Kip (talk) 21:46, 9 October 2013 (CEST)
I think the human brain works well about languages, and I noticed in reading foreign languages, specially Mongolian, which is an agglutinating language, that it takes me more time to analyse an inflected word (even if I know all the elements) than to recognise the form if I know it. But I don't (and can't) memorise all forms. So, as I said long ago, I think a good approach in terms of memory/time compromise would be to record the easiest forms (1 or 2 inflexions), and to debuild or generate on the fly the more complex ones. And yes, working with Apertium would be great. In order to work with inflexions, we need to fill the word class field of words (which is often not done in Omegawiki), we need word subclasses, and we need a stricter policy concerning the creation of definedMeaning or not, to link the inflexions between languages (To answer questions such as: "Should the French 'chienne' (bitch) be considered an inflexion of 'chien' (dog)?") See Policy to create separate definedMeanings or not. Moreover we need money. And to get money, we need an effective legal entity. Since, your like it or not, the adoption by the Wikimedia foundation doesn't work and nobody (except me) seems to be interested any more by founding a separate association, what about merging with Apertium, which already uses Omegawiki? --Fiable.biz (talk) 00:18, 10 October 2013 (CEST)

I for one prefer OmegaWiki to be independent legally, and cooperate as much as possible technically and contentwise and socially. That said, financing the work we do would certainly be helpful, if not direly needed to make some progress.

If you make (semi)automated translations between inflectional languages and you do not go "very fuzzy", then you have to have an inflection decomposer and an inflection composer. :-) So obviously, Apertium would have to have either in one way or another.

I had a look at some of the files in the https://svn.code.sf.net/p/apertium/svn/incubator/ directory. There is a lot of stuff, and it is not at all at once obvious for me how it is made up. I would have to read some introductory materials so as to understand what I needed to do to get that for my language, Colognian. Yet I know that a tabular collection of the basic inflectional data of Colognian would have to fill a book of more than 100 pages. Of course not including which word uses which of the 400 to 600 regular conjugations, declensions, etc., and of course noted in cascading style, as Fiable.biz suggests, so that hardly any informatioin is unneccessarily duplicated. Even based on publications existing in print, collecting the data electronically once I had understood how exactly this is to be done, would likely take many months of fulltime work which I cannot efford without pay.

Is there a way to be educated about Apertium? --Purodha Blissenbach (talk) 16:41, 10 October 2013 (CEST)

The easiest way to understand is to try it out. I used it on Catalan/Spanish where it is VERY well programmed and where I know it is used to translate whole newspapers on a daily basis. Then you take a less evolved language combination here: http://xixona.dlsi.ua.es/testing/ - and there you will see how Apertium behaves "wrong" - this gives you a first idea about how it works.
Then let's take an example of a verb:
  • You take one verb and create the whole conjugation - complete I mean, with everything - outlining the stem of the word and the part that change, which auxiliary it is conjugated with etc.
  • Then for other verbs that behave the same, they get the same paradigm associated and that means you only give the stem part of the verb and the rest can be built by the machine translation tool
The same is valid for cases, plurals etc.
Irregulars are treated separately
The information is stored in xml sheets
Example of Neapolitan nouns and how we prepared them: [1]
Btw. it was back in 2007 when I proposed a co-operation with Apertium for the first time - and: I am not going to stop doing it :-D because we need "one work, used VERY often" - and the same information can also be relevant for Grammar checking in a second stage
We don't need all Apertium offers, because it also offers "grammar rules" to be applied from one language to the other. Whereby it is easier to get it programmed for similar languages (e.g. German/Franconian, Neapolitan/Spanish/Catalan etc.) --Sabine (talk) 18:05, 10 October 2013 (CEST)
Merging with Apertium would not give us money for work, but two projects to maintain. I see supporting each other where possible more relevant. Sorry, but I spent really a lot of time in OmegaWiki, much of that time before it even went online. It is not what I desired to see, well it was not when I left the project. Nowadays it has functionality that back then was among my wishes and somewher in the "nowhere" to reach. Still a lot is missing, but people: we are all working in our free time here - so what? We go ahead and when time is due the solution will be right in front of us. --Sabine (talk) 18:12, 10 October 2013 (CEST)