As an anonymous user, you can only add new data. If you would like to also modify existing data, please create an account and indicate your languages on your user page.

International Beer Parlour/Archive3

From OmegaWiki
Jump to: navigation, search













DefinedMeanings and forms of words[edit]

There has been some discussion on the treatment/separation of various forms (inflections etc) of words. My (current state of) understanding is that since a DefinedMeaning is the combination of a given Expression (Spelling+Language) and a Definition, by default that means that a inflected form (which will constitute a different Expression) will need to be a separate DefinedMeaning. However, this set of DefinedMeanings should all share the same Definition, the grammatical information that separates them should not be part of this Definition, but stated elsewhere (yet not possible...)--Sannab 23:02, 28 July 2006 (CEST)

When clarifying the scope of the definition, it should be noted that some grammar aspects of words only appear to correlate to semantic meaning. As we include more languages, we will face stronger challenges regarding the dividing line between grammar and semantics. Following are some quick observations how that dividing line affects WZ definitions:
  • Number of nouns: the number of nouns of languages with number often (but not always) has semantic meaning (e.g. "dogs" indicates more than one referent, but "scissors" does not and the number of the referent of "fish" depends on context).
  • Gender of nouns: the gender of nouns of languages with gender often (but not always) has semantic meaning (e.g. the gender of the referent is not specified by the masculine noun "miembro").
  • Case of nouns and pronouns: case rarely indicates much about the semantic role of the referent, but rather the grammatical role of the word within a given sentence (e.g. "I" in I washed the dog and "me" in The dog was washed by me have the same referent).
  • Gender and number of adjectives: most adjectives can usually modify nouns whose gender and number do not match the gender and number of the referent. So, it will usually be best to show such details only in grammatical attributes of the DefinedMeaning and to allow the general definition to be shared across languages.
  • Number, tense, aspect, mood, and valency of verbs: This is very tricky territory. In many cases, however, definitions will translate best by showing such features only through grammatical attributes of the defined meaning and to allow the general definition to be shared across languages.
I'm sure many lexicographers will want to include grammar in definitions, but remember that the WZ software will match the right defined meanings from languages with similar grammatical features. Rod A. Smith 01:46, 29 July 2006 (CEST)
I find it a bit hard to really draw conclusions here at this point of the project's development. Personally, I'd like to wait until we have more functionality before we can really see what the role of the DefinedMeaning is, and how other functions can be complementary to that. I'd also like to see how the UI will handle it. — Vildricianus 11:21, 29 July 2006 (CEST)
I'd say that at the moment, only dictionary forms should be listed. For example, english verbs in the present tense only. Nouns only in singular, unless the plural exits "on its own" as a plural noun" (you mentioned scissors).
Adding grammar information will require a huge load of more functionality, and all of it will be language-dependent. Of course, at some point this dictionary will need flection tables and other information, but all of this must be tied to a specific expression and thus to a DM. --Mkill 20:58, 30 July 2006 (CEST)
(quote) but remember that the WZ software will match the right defined meanings from languages with similar grammatical features -- Oh, I can already see so many hundred ways this will fail. --Mkill 21:01, 30 July 2006 (CEST)

Maybe we should discuss this in Functionality_wanted_..#Grammar functionality. --Mkill 11:40, 5 August 2006 (CEST)

From a structural standpoint of view I can only strongly recommend against the idea of trying to mix anything like /part-of-speech/gender/number/mood/tense/aspect/referrent/referrences/, or remnants of such, into DMs. I guarantee, that would be creating problems somewhere later. We must strictly assume that nothing like these categories will be available in some other language, and there might not even be a concept of such elsewhere → a DM using it would be impossible to translate. All these informations - where they exist - need to be specified as part of a set of grammatical, syntactical, or relational properties of a given DM+expression in a specific language.
If you would be able to build a pun or language joke on something ("You cannot cut rice with scissors. Why? They're just too many for a single item") you can easily see how this does not translate, you usually need to explain it in a foreign language to your audience, or just sadly drop the fun part of it, that depends.
Now I can see how one would argue: It does make a big difference whether I have one president or several. → eng:"president" and eng:"presidents" need two different DMs. Wrong nevertheless. Tell me, how many snow(s) you have (or sand, or rice) and is there more than 1 knowledge if more than 1 people are around? What is the difference between the 0 dictionaries on my desk, and all the dictionaries that are not on my desk, and 0 dictionaries not on my desk, and 1 dictinary not on my desk? It is simply the empty set, mathematically spoken, (not even dictionaries!), yet gammatical representations vary between languages and with wording. We cannot safely assume that qxy:"president(s)" does not have the grammatical properties of eng:"rice" or eng:"scissors" or both or neither.
Note also clearly that this is not an argument against two DMs for singular/plural most generally, but we must keep in mind that plural, in english, is also used for 0 instances, whereas singluar is used for missing instances ("There are no pens here, not a single pen is there.") which are conceptually identical, requiring identical DMs, so in many instances, it would be futile and incorrect to infer 'grammatical number' from the pl. or sg. declension form. A similar conclusion can be drawn from translating eng:"The presidents shook their hands" to deu:"Die Präsidenten gaben sich die Hand" where eng:"hands" pl. becomes deu:"die Hand" sg. because all such sentences in Standard High German 'contain' an implicit eng:"each" by virtue of German grammar, and cannot otherwise be build. Thus we must concede that, put a bit sloppy, different languages count differently, depending on context, and number is not usually part of a DM.
The very same sort of arguments can be used against any of /part-of-speech/gender/mood/tense/aspect/referrent/referrences/ and most likely all other grammatical and syntactical categories being useful discriminants in DMs.
Ok, this sort of discussion probably should be followed more deeply in the International Linguists Beer Parlour --Purodha Blissenbach 16:55, 7 August 2006 (CEST)
Yes, different languages count differently, but those that inflect nouns with number do include semantic information in that inflection. That information may be ambiguous or translate inconsistently in certain constructions but the noun's number is more than just grammar.
Note the similarity with how a noun's gender can indicate sex of its referent. That prompted us to write 3 definitions for the gender-unspecified (sex-ambivalent) demonym "Expression:German". Likewise, we could include 3 definitions for each noun: one with a single referent, one with a multiple referent, and one with a numerically ambiguous referent. If we did so, then just as "Expression:alemán" has both a male definition and a definition without a specified sex, "Expression:dog" could have both a single and a number-unspecified definition.
Is such an approach necessary or even helpful? I don't know. Rod A. Smith 18:31, 8 August 2006 (CEST)

Where is the "Entity Relation Diagram"?[edit]

On #wiktionary I have just been informed of some type of design document for OmegaWiki known as the "ERD" but we've been unable to locate it. I think this is the kind of important thing that should be prominently linked to, possibly from the front page. Could somebody point me to it? — Hippietrail 02:42, 29 July 2006 (CEST)

The currently implemented data design is at OmegaWiki. The more complete vision (not yet implemented) design is at Ultimate Wiktionary. Rod A. Smith 03:04, 29 July 2006 (CEST)

GEMET themes and themes in general[edit]

HenkvD just defined food, drinking water as a GEMET theme, which raises the question in my mind what to do about the themes. This theme in particular is not a valid Expression, it is a disambiguated phrase, which makes it stand out all the more. Should the GEMET themes be replaced with the concept of Domain introduced elsewhere, or will this type of Relation still have a place further on? And what should we do with them in the meantime?--Sannab 12:09, 29 July 2006 (CEST)

I just defined two of the missing Insect_room#Bulgarian_themes. I planned to do the others as well, but I better wait for reactions. Houndreds of expressions lead to these themes. See for example alarm (theme disasters, accidents, risk) or architecture (empty bulgarian theme) HenkvD 13:26, 29 July 2006 (CEST)

Kana readings for Japanese[edit]

I checked a few entries on Japanese words an came across Expression:顳顬. Even if OmegaWiki would give me a translation, it is still useless, as I would not know how to read that word. The minimum functionality would be a kana reading (usually Hiragana). In fact, as the characters in the example are not Jouyou Kanji, Japanese readers might want Kana readings, too. The reading of a Japanese word is an intra-Japanese function and has nothing to do with translation.

Hepburn and Kunrei Romanization would be nice, too. Note that Hepburn Romanization is not language-dependent, it is used in Japanese subway signs as well as scholarly texts in English, German, French, Italian and a number of other languages.

Note that Kana readings can be disambiguous, and there can be more than one reading for one word with the same meaning, or different readings with different meanings. --Mkill 02:41, 30 July 2006 (CEST)

The software doesn't yet have pronunciations, but note that you're free to enter the kana version of Japanese kanji entries as "synonyms". That should suffice for now, right? Rod A. Smith 06:00, 30 July 2006 (CEST)
This does not work. Japanese has a huge number of homophones. If I add a reading as a synonym to one homophone, all homophones will be assumed to be synonyms. This is akward and wrong. Example: the 新和英中辞典 lists 11 entries for しよう (shiyō), among them 私用 (private use), 子葉 (seed leaf), 飼養 (breeding) and 資陽 (place name in China). And any readings that would be added as synonyms now would have to be deleted and readded as readings later, once the function is implemented. No, to avoid chaos the only option is to wait until readings are implemented. --Mkill 12:31, 30 July 2006 (CEST)
Please define what a reading is. Thanks, GerardM 13:30, 30 July 2006 (CEST)
When an expression like しよう has 11 links to 11 defined meanings, they will show 11 different meanings. I would invite you to work on work on しよう and help yourself that this is not an expression based system.. It is based on the DefinedMeaning. Thanks, GerardM 13:30, 30 July 2006 (CEST)

Ok, I'll try to define what a reading is. The Japanese writing system has a layer of abstraction that other languages don't have (except for Chinese, but on a smaller scale). "DefindedMeanings" are tied to the Kanji in Japanese, but Kanji can be read in different ways, each time denoting a different defined meaning. 東京 (Tokyo) is a simple case, these characters stand for one word with one reading, which in this case is とうきょう (Tōkyō in Hepburn romanization). BUT. There are cases when the same characters can mean different words, each with a different reading. Example: 大勢 can mean おおぜい (ōzei), "lots of people", or たいせい (taisei), "general situation". 際, read as "sai" means "if, in case", 際 read as "kiwa" means "edge, brink".

Thus, for Japanese, it is necessary to have an entry with each defined meaning that gives the reading of the character for that specific meaning.

This should be implemented for Chinese, too, because in Chinese a few characters also have different defined meanings based on reading (in Hanyu pinyin). --Mkill 15:25, 30 July 2006 (CEST)

Aren't kana forms valid "alternative, not formally used spellings" of the proper kanji forms but valid in certain contexts (e.g. in gradeschool readers and as furigana)? If so, adding them as "alternative spelling" synonyms would merely result in having 11 definitions for the informal expression "しよう" (shiyō). We will eventually need to mark them as "alternative, not formally used spellings" or some such, but do we really want to delete them later? Rod A. Smith 19:40, 30 July 2006 (CEST)
Yes and No. This is in fact a complicated matter. Most nouns will have to be properly spelled in Kanji in any proper Japanese texts (let's forget about children's books, this is not a children's dictionary). Only if a word would have to be written in Kanji not in the Jōyo list you can replace them with Hiragana. Animal and plant names can be written in Katakana (in that case, the Katakana form is a proper synonym, the Hiragana form is not!) A third category are functional words like particles, conjuctions and auxilliary verbs. These are usually written in Hiragana, even if Kanji forms exist.
So yes, we would want to delete any Hiragana form of a word which is not written in Hiragana in usual Japanese texts.
Just accept that you can't just give the Kana reading as a synonym, it needs a separate entry marked "reading" like in any serious Japanese dictionary. This project is supposed to be a serious dictionary, right?--Mkill 20:14, 30 July 2006 (CEST)
NO let's keep the childrens books! This is an all languages, all varieties, all forms, etc. … dictionary, and childrens books spellings have to be included, of course. Up to now, I do not see any difference between the various ambigous Kanji, or Kani Expressions, you mention, and any other homographs in any other language, such as in eng:"blow your head off" and eng:"blow out the candle" - only your assumption that, somehow, jap:"際", read as "sai" (eng:"if, in case"), and jap:"際", read as "kiwa" (eng:"edge, brink") could somehow be confused, is wrong. Their DMs should be different prohibiting any kind of confusion.
I think you missed the point about 際. There will be two DMs linked to 際, of course, one saying "edge, brink" and the other "if, in case". But without any additional features, there would not be any hint which meaning is tied to the reading "sai" and which to the reading "kiwa". Note that we can't write the reading into the defintion, as this one is supposed to be independent from language and synonyms. If I would add 角 to the list of synonyms of 際, and the defintion would read "edge, brink" (read as "kiwa"), people would be confused about what Japanese expression this relates to... --Mkill 01:27, 9 August 2006 (CEST)

Italian spelling, pronunciation and meaning[edit]

I add this here (instead of the end) because it is somewhat related to the Japanese readings question. Italian standard spelling does not specify accents (except for very few cases, only at the end of words), with the result that there are words like pesca or nocciolo that have two separate pronunciations (pésca and pèsca, nòcciolo and nocciòlo) with completely separate sets of meanings (resp. peach and fishery, stone and hazel). Also in this case, the versions with accents are not appropriate spelling. For the moment, I've made the defined meanings using the "phonetic" spelling, and linked both to the proper spelling, but a "reading" would be useful for italian too, in these rare cases, to clarify (especially to non-italians) which one is the proper spelling and which ones are not. -- Sergio.ballestrero 15:43, 5 August 2006 (CEST)

Would that be poperly done with, once whe have (planned) fields, such as IPA, Sampa, …, and Sound Sample available?

Variant spellings[edit]

There are words which can be written in different ways in the same language, with the same meaning. Example: German "Geographie" = "Geografie", or Japanese サザエ = 栄螺 = 榮螺 = 蠑螺. In both cases, all readings need to be tied to the same DefinedMeaning, as it is the same word in different legitimate spellings, as new spellings were introduced in language reforms but old spellings are still valid. Note that this is contrary to British / American english, which is a regional variant. How does / will OmegaWiki handle this? --Mkill 02:41, 30 July 2006 (CEST)

For clarity, please note that alternative spellings need not be tied to a single "defined meaning", but rather to a single "definition" and an identical pronunciation because each alternative spelling can have its own features. The example you gave doesn't inflect, but the analogous examples (e.g. for most German nouns with alternative spellings and for Japanese verbs or adjectives written in various scripts) would each need to show its own inflected forms. Rod A. Smith 05:58, 30 July 2006 (CEST)
Actually this is not clear at all. The DefinedMeaning is what you can link to. You do not link to Definitions. Not at all. An alternate spelling exists on the same level as a synonym and a translation. When we get to the stage where we can add attributes to a SynTrans record, it will become important to identify what such an alternate spelling can be classified as. GerardM 07:48, 30 July 2006 (CEST)
Well, an alternate spelling needs to be classified as just this: an alternate spelling. It should not have a different DefinedMeaning entry because there is no difference in meaning (and hence, no difference in translations, synonyms, antonyms, and so on). If you open Expression:Geographie, it should show the same entry (for German) as Expression:Geografie. If you add a synonym to one (Expression:Erdkunde), it should appear in the other. The only diffence should be the entry for "alternate spelling".
So, what you need is a field "alternate spellings", which makes the defined meaning appear under a different Lemma. --Mkill 12:12, 30 July 2006 (CEST)
Indeed. GerardM 12:22, 30 July 2006 (CEST)
I added Expression:Geografie as a synonym to Expression:Geographie. It's not a perfect solution but it works for the moment. Maybe the solution is to make the "identical meaning" checkbox more intelligent. An option would be to have a dropdown box after each word instead, where you could choose "direct translation", "approximate translation", "synonym", "variant spelling", "identical meaning" etc. --Mkill 13:21, 30 July 2006 (CEST)
Imho the Idea of having a list of, or set of fields of, 'alternate spelling's is somewhat mistaken as far a data base representation is concerned. We cannot distinguish between the various possible spellings 'of an expression', since there is no decent way to make one stand out, and call the others 'the alternates'. It's rather to be seen as several expressions relating to each other as 'being an alternate spelling of each other in area during time of measure officially being used under contraints by percentage of the writers in publishing contexts.' if put in a more generalized form. Of course only a limited set of such relations exist, an they probably need to be developped, and refined over time, as need be.
I do, at 1st quick sight, agree with the idea to have a flag distinguishing 'synonyme' from 'variant spelling' while synonyme/translation is already implied by the two 'language' fields being identical or not - i.e. a translation to the same language is a synonyme, in a strict sense with 'idential meaning' set, else in a weak or wider sense. -- Purodha Blissenbach 17:55, 7 August 2006 (CEST)
Side note: At the moment, "time of measure" would always be "contemporary", as we have no way of marking historic spellings and expressions yet (here comes another feature request!) --Mkill 01:21, 9 August 2006 (CEST)


Does the "Norwegian" entry indicate Nynorsk or Bokmål? Actually, there should be two entries, Norwegian (Nynorsk) and Norwegian (Bokmål). --Mkill 17:47, 30 July 2006 (CEST)

There need to be 4 entries a least, Norwegian, Rijksmål, Bokmål, Nynorsk. Additionally, Norway has a large collection of individual local and regional languages. Also these are needed here … -- Purodha Blissenbach 18:02, 7 August 2006 (CEST)

The term "Norwegian" refers to Bokmål and Nynorsk, which are both official forms. Riksmål (without j) is a third form. Even though it is widely used, it is not an official form. So, only 2 entries are needed: Norwegian (nynorsk) and Norwegian (bokmål).
Both forms are pretty much the same, but many words have different spellings. An average Norwegian that has bokmål as their native language should understand most of Nynorsk even though he has never learned about it. An example would be the sentence "I am from Norway":
Bokmål: Jeg er fra Norge.
Nynorsk: Eg er frå Noreg.
Riksmål is very close to Bokmål, and the example would've be spelled exactly the same way as Bokmål.
However, the sentence "You wrote in English" is "Du skreiv på engelsk" is the same in both Nynorsk and Bokmål. The sentence "Du skrev på engelsk" (skrev without i) is the Riksmål version, but the word skrev without i can also be used in Bokmål, but not Nynorsk.
As a side note, the Norwegian (and Danish) letters Æ, Ø and Å are NOT pronounced like E, O or A, but are completely different vowels. If there are any questions, feel free to ask. :-) Mathias-S 18:54, 11 August 2006 (CEST)
Mathias is (of course) absolutely right. About variants of Norwegian, one could mention – besides the standard Bokmål and Nynorsk – Riksmål, Høgnorsk and Samnorsk. Riksmål is conservative Bokmål, Høgnorsk is conservative Nynorsk (dating back to before 1917), while Samnorsk is a failed effort to combine Bokmål and Nynorsk into one language. Bokmål and Nynorsk are the only official forms, and I don't think the others should be included here. But quite right, the language on OmegaWiki called "Norwegian" should be split up into "Norwegian Bokmål" and "Norwegian Nynorsk". Having an additional "Norwegian" here would only result in a mess. I don't think Riksmål should be added as a separate language – it has no ISO 639 code, and 99.9 % of Riksmål (my calculation, surely it is not far from the truth) is also perfectly valid Bokmål. Jon Harald Søby 17:52, 4 September 2006 (CEST)
The question is which of these languages the imported "Norwegian" words from GEMET/EIONET have. See also a list of these words via Google. HenkvD 19:51, 4 September 2006 (CEST)

Lemmata with brackets[edit]

I noticed there is a number of expressions with a comment in brackets after the word, such as Expression:resolution (act). Isn't this against the idea of having all defined meanings in one place? Shouldn't this DM be found under just Expression:resolution? Or is there already a plan to fix this which I missed?

A related problem are expressions which do not consist of one word but of several synonyms, such as Expression:sklep, odločitev, resolucija, which should rather be Expression:sklep, Expression:odločitev and Expression:resolucija. --Mkill 20:41, 30 July 2006 (CEST)

Third variant: Acronyms with the explanation given. I think that expressions like Expression:ASEAN (Ένωση Κρατών της Νοτιοανατολικής Ασίας) should be given as Expression:Ένωση Κρατών της Νοτιοανατολικής Ασίας and Expression:ASEAN (with Greek entry) instead, linking to each other as synonyms. --Mkill 20:53, 30 July 2006 (CEST)

A fairly large number of entries was imported from GEMET and some of them do not have the correct format. You can fix that by first disconnecting the wrong expression from the Defined Meaning and then deleting it. And of course adding the correct expressions. --Tosca 20:54, 30 July 2006 (CEST)
Ok, the main reason I was asking is to find out whether this was intentional or a problem to be fixed :) --Mkill 21:03, 30 July 2006 (CEST)
Huh, disconnecting a word and a DM doesn't delete the word? If no, then I may have created a lot of orphans until now... Lo siento ( el pollo, desde España. )
Disconnecting it does not delete it, it might still be a valid Expression, even if it does not express the Definition. So, yes Kipcool, you have most likely created a lot of orphans, which you should only worry about if they are not in fact valid Expressions *smile*--Sannab 17:23, 3 August 2006 (CEST)

Definitions describe words or referents[edit]

For most words, it is convenient for their definitions to describe a concept represented by the word (the referent). For example, the concept represented by "dog" is easily describe:

  • Good definition: "A common four-legged animal...."
  • Bad definition: "A word that represents a common four-legged animal...."

For some words, however, it is easier to describe the word itself. For example, it is difficult or impossible to avoid talking about the word itself in the definition of "the":

  • Definition: "The definite article."

Sometimes, there it's not clear whether to describe the referent or the word. For example, some of the definitions given at "I" describe the referent and some describe the words:

  • Describing the referent: "The speaker or writer referring to himself or herself alone."
  • Describing the word: "Pronom personnel désignant celui qui parle." ("Personal pronoun designating he or she who is talking.")

In this particular case, the definitions are out of sync, but more importantly, should we standardize whether

  1. Definitions may describe words themselves (possibly with some attribute to allow a different display format);
  2. Definitions should describe referents wherever possible; or
  3. It doesn't matter (-: there are more important matters to think about :-).

Rod A. Smith 04:13, 31 July 2006 (CEST)

I suggest depending this on word type. For nouns, verbs and adjectives, describe the referent. For auxilliary verbs, conjunctions, pronouns, particles, articles etc. describe the word itself. (Note that this matches your examples, as "dog" is a noun, "the" is an article and "I" is a pronoun. --Mkill 11:07, 31 July 2006 (CEST)
I insist on not taking 'word type' or 'part of speech' into consideration when wording DMs. Actually, consider, but then leave out of the DM ;-) These types of informaton are language dependant, and should remain such. I know of the difficulty of finding DMs for certain types of words under that constraint (e.g. eng:"Hey!", eng:"so", eng:"No!", sve:"Hey", deu:"Autsch", ksh:"Bah!", ksh:"Däh!)
If we really find that we cannot get along without somehow referring to the usage of an expression i.e. word type, or similar, then we must introduce different types of DMs, using different layers of metalanguage, since we must not ever confuse, or mix language and metalanguage under any circumstances. For reasons much too complex to be explaned here, we would otherwise jeopardise project success. While that may seem a bit esoteric, practical consequences are pretty workable, though.
Imho both the abovementioned types of DM ("…common four-legged animal…", "…Word designating who is talking…") are Ok and valid, yet they exist on different layers, and need to be flagged as such so as not to create confusion when they are translated to another language. So far, we have two kinds of DMs:
  • 1. DM on the object layer: a dog is an animal that …
  • 2. DM on the abstraction layer of word usage: the word 'we' is being used to replace …
(note the use of quotes with 'we' but not with dog) As far as I can see, there is no problem, when, instead of a translation, a DM of another type is used. That would usually be, because a translation (of the DM) seems not possible. We should, whenever possible, use the object layer type of DM.
Theoretically: I am not certain about details here, it might possibly be required, to include even another layer of abstraction, which could then be similar to:
  • 3. DM talking about language: the Innuit language does not have a concept of this kind. It is usually …
not to be confused with not-to-a-word translations of the type requires use respectful form!.
-- Purodha Blissenbach 19:07, 7 August 2006 (CEST)

How specific should Definitions be?[edit]

There seems to be a tendency to define Expressions rather vaguely. This of course helps in finding synonyms and translations, but does imo not do justice to the first Expression. We have cases were there are several synonyms of the first Expression marked as Identical.

Perhaps these rather vague Definitions are needed if we do wish OmegaWiki to work also as a multilingual dictionary, but if we leave it at that, then we will surely fail the monolingual users.

If it is possible to add a synonym (leaving the translations apart for now) with an identical match marker, that is imo a clear warning sign that perhaps the Definition is not adequately bringing out what characterizes the Expression. Of course there will be cases of true synonymy, but perhaps not as many as the current trend seems to imply.

So can we do both? Is there room in OmegaWiki for Alternative DM:s (note Alternative Definitions will not do in this case!) separated only by how specific they are? Or should we go hardline and require more specific Definitions, and a lavish use of Not identical? If we choose the double DM way, then imo we need a way to mark the general Definitions.

I must say that I am leaning towards the hardline way; specify each Expression as thoroughly as possible, and live with fact that Identical matches will then be rare.--Sannab 09:49, 4 August 2006 (CEST)

Could you please give an example? Ciao, --Sabine 10:22, 4 August 2006 (CEST)
I don't think vague definitions help a multilingual dictionary at all. Words like the English Expression:go need some 100 defined meanings, because equivalents like the German Expression:gehen and French Expression:aller will probably share a lot of meanings and usages, but won't work for many others and you would need a different verb. A good multilingual dictionary should be as precise as possible. --Mkill 13:49, 4 August 2006 (CEST)
  • eng: I go to school - deu: ich gehe zur Schule
  • eng: go by plane - deu: mit dem Flugzeug fliegen
  • deu: das geht nicht - eng: this doesn't work --Mkill 13:51, 4 August 2006 (CEST)
Not being a native speaker of English (and the English first DM:s still being the most plentiful) I cannot be sure that they are not in fact true synonyms, but the DM that set this train of thought rolling was:
  • at/near : With very little distance to or in a particular place or location.
Right now I cannot recall what other DM:s built up this feeling in me, will update when I encounter them again. *smile*
The other issue that MKill raised is imo related but not identical to the question at hand. go should of course first and foremost be associated with Definitions actually expressing what the word means in English. That it happens to fulfill a lot of quite vague verbal senses will most likely mean that very few of the translations will be identical, and also that it will be chosen as a non-identical translation for a great many DM:s started from other languages. However, when writing Definitions to create DM:s for go, these other languages should not imo be taken into account. The first and primary task for editors of any language is to adequately cover/define the Expressions of their chosen language. The matching of Expressions between Languages should not influence the writing of the Definition.
This also highlights the fact that we as yet do not mark non-identical Definitions apart from identical Definitions (or even first Definitions) for a given Expression.--Sannab 14:16, 4 August 2006 (CEST)
Ok, I see (or at least I suppose I understood) - as for the bits by Mkill above: these are expressions and do not represent go - it is about the usage of the word go and that will be dealt with separately.
There is also the thingie to be considered (maybe this is the answer to Sanna): now, if you click on the Italian word and the first defined meaning was in English, you get that one - in future that will be different: there will be the defined meaning in English of course, but, if the Italian word does not have an identical meaning its defined meaning will be different - so this does not need consideration right now - we will have it once we get that feature. Therefore the not tagging the identical meaning box is very important and maybe we should consider that one better.
I noted that many translations are added and the identical meaning tag is on even if it should be off ... maybe, as for a rule, we should ask it to remain off ... but in the end: it will be the same work to be done: check entries and there will be loads of work to merge things that are identical ...
Just don't want to think about it now ;-) I'd get mad ;-) --Sabine 14:43, 4 August 2006 (CEST)
Note that the examples I gave are both different usages and different DMs.
  • go to school - to attend something regularly
  • go by plane - to use a means of transportation
  • to work - to function
That you would need a different word to translate an expression in two different sentences is the best sign there is that these are in fact different meanings. --Mkill 15:31, 4 August 2006 (CEST)

Verb + preposition combinations[edit]

How does / will OmegaWiki handle combinations of verb and prepositions, such as "go by", "go to", "go for", "go through", "go down", "go about" ... These are all fixed expressions that need their own DM. I see three different ways to do that:

  1. Put them under the verb and make a note in the DM that this meaning only applies to the combination with a specific preposition (this will add a lot of definitions to verbs like "go".
  2. Put them under the expression verb + preposition Expression:go by
  3. Put them under verb + preposition + sth. Expression:go by sth.

--Mkill 15:31, 4 August 2006 (CEST)

Personally I'd use the second possibility ad add it as related term to go. --Sabine 16:27, 4 August 2006 (CEST)
I seem to tend to do both - differently though.
Last exaple first: (Expression:go by sth. is outright nonsense, as 'go by sth.' is not an expression in the sense implied here, it is only an expression in the realm of language science).
Expression:go by has varius uses and DMs, such as in Let us go by and change it!, or in We can go by train, etc. with different grammar, and different DMs, too.
Entering a notion of 'that this meaning only applies to …' in a DM is again conceptually wrong. This has to go to the grammatical properties section of the appropriate DM+eng:"go" where the DM(s) simply reflect the concept(s) covered by these use(s) - there would be quite many grammatical relations lnking eng:"go" to various prepositions that can go with eng:"go", each likely associated with different DMs - such as I go to the house and I go into the house do represent different concepts. .. Purodha Blissenbach 19:31, 7 August 2006 (CEST)

All German terms imported from GEMET should be deleted[edit]

I've been trying to fix this, but it's not worth it. We'd save a lot of time by deleting them all and translating ourselves. The imported GEMET expressions (not only GEMET, other sources too) contain too much errors, when it comes to German. So far I have found:

  • Wrong capitalization: in adjective + noun combination the adjective is always capitalized in the data, which it shouldn't be. There is over 20 of these in the letter A alone.
  • Plural / singular don't match between translations
  • Missing umlauts

Note that all given examples are for the letter A only. I can only guess what other errors await in B to Z.

Is that only German? Are other languages ok? --Mkill 02:57, 5 August 2006 (CEST)

(I'm sorry if I sound a little drastic with this one. But I'm really angry because the imported data was supposed to be created by professional translators. And I just can't believe I'm fixing a job that professional translators screwed up. German is the native tongue of one fifth of the EU ppopulation, the largest share of any language. It's unbelievable they didn't have any German native speaker to counter-check. If they actually had one, he screwed up enormously.)

Yes, that sounds really really bad. But I think we should go through and keep the correct things (hope there are some...). Grtx, --Thogo (Talk) 11:43, 5 August 2006 (CEST)
Well yes, GEMET has its problems. I don't know how many entries were created by professional translators - some of them yes, others not. We really don't know who did what here (isn't a wiki a great thing? you can find out who did what there :-)
Italian has some issues (mainly plural/singular if I remeber well) and all Bulgarian entries should be checked - they have a capital letter (sigh) and should be lower case. Well being on a wiki many people can help to improve this. I'd say that the majority of the GEMET contents is OK, but since we are human beings the negative things remain more impressed then the positive ones (who knows why ...).
Let's just make the best out of it I'd say. And: thanks for doing so. --Sabine 14:06, 5 August 2006 (CEST)
Well, that would be a little extreme. ;-) If you go through the German entries starting here Special:Allpages/Expression:A you'll find that there are only about 10 pages (all German expressions from GEMET are spelled with a capital letter) and that most entries are ok. I'm pretty sure that we can fix the wrong entries. The wrong capitalization can't really be considered an error by GEMET, it's just how their system works. Wikipedia capitalizes everything as well.
It's not only German, other languages have problems as well. All Bulgarian entries are capitalized and many entries are lists of expressions. GEMET has around 70,000 entries in 22 languages. So every language community has to check about 3,000 entries and that's certainly doable.--Tosca 15:53, 5 August 2006 (CEST)

OmegaWiki needs its own IRC channel[edit]

Using the general #wiktionary channel has sufficed so far (and it is usually good fun to be there *smile*), but I think we now have the need for our own channel, where we can discuss OmegaWiki without having to qualify each statement with on wiktz. So is it not time to create #omegawiki? --Sannab 18:27, 5 August 2006 (CEST)

Oh yes. I often got confused by others and I often confused others speaking about DE-WT or EN-WT while others thought we were speaking about WZ (or the other direction). So maybe a #omegawiki channel would be a great advance, I guess, even if there are the same people around as on #wiktionary. One could more easily distinguish between Z and non-Z Wiktionaries. Grtx, --Thogo (Talk) 19:06, 5 August 2006 (CEST)
It's there. Have fun. Siebrand 20:51, 6 August 2006 (CEST)

Adding terms while translating[edit]

Now I am doing a translation - with quite a bunch of interesting words. But: I cannot afford the time to add these word with the definition, because otherwise I would not be able to finish my job in time. This means that a) I cannot add these terms (which would be a pity) b) I have to require higher prices from my customers (who would not be happy ...) c) I need a possibility to add source + target term - and then I or someone else can care about the definition. This will be a continuous problem not only for me, but for all translators who would even like to contribute. And the solution is ??????? --Sabine 09:09, 7 August 2006 (CEST)

I suggest allowing the use of ~~~~ in this fashion in Functionality wanted ..#unique placeholder DM content as a software extension -- Purodha Blissenbach 19:46, 7 August 2006 (CEST)
Another possibility would simply be a checkbox to tag that says "no defined meaning yet" or something like this - then one can do a sql query for all words without defined meaning and add it. At a certain stage we will anyway get into that situation since we have huge amounts of vocabulary lists that are just bilingual and without definition ... what comes after that is another work: merge existing content ... and that will be the funny bit for the multlingual people, those who can understand if one entry and the other should be linked or not ... step by step we will reach all that :-) Thanks for your comments - they would be a good way to do things. --Sabine 23:36, 7 August 2006 (CEST)
I second the request for a "no definition yet" checkbox. There should be a category that collects all expressions with the tag. --Mkill 01:18, 9 August 2006 (CEST)

Archived sections[edit]

Is it permitted to add to sections in archived portions of 'International_Beer_Parlour', or should we rather re-start the subject here? -- Purodha Blissenbach 17:19, 8 August 2006 (CEST)

Borrowed words[edit]

When thinking of spelling variants (s.o.), borrowed word come to mind. They have a quite interesting set of aspects.

  • Often, borroughed words keep their spelling in spite of spelling rules of a borroughing language.
    • e.g. the English word is spelt 'kindergarten' despite its commonly pronounced 'keendergarden'; in many languges you have 'computer', not 'kompjuter', 'kom(m)pjoeter', 'camp(i)utor', etc.
    • An implication of this is that, it may be pretty hard for screen readers and speech synthesizers to deal with them.
      • Unless an installation chooses to mark them as foreign words, e.g. <span lang="sv">ombudsman</span> (the validity of which can be questioned), software has basically to identify them based on dictionary information.
        • If OmegaWiki provides such info, speech input and output software can be assisted in choosing the correct processors or filters based on ordered language pairs. I am pretty certain, there is only a very limited choice of filters per language pair, if any, and their selection could easily be based on a (qualified, directed) relation between two expressions of two languages.
  • Often, borroughed words do not follow all inflection rules of the borroughing language, or they are in inflection classes on their own.
    • → language specific data entry modules could be saving editors labour.
  • Eventually, borrowed words loose their recognition of being borrowed.
    • e.g. German borrowed 'bureau' from French. In the course of the 20th century, spelling was gradually changed from 'Bureau' over to 'Büro'.
      • in order to correctly reflect which spelling was in use when, we need to identify such changes in the relations linking the spelling variants (e.g. deu:"Bureau" and deu:"Büro") to one another.

-- Purodha Blissenbach 17:19, 8 August 2006 (CEST)

How do I start a Swadesh List for a new language?[edit]

Can I simply open a new page with a new Swadesh table, or should I handle each word separately? DrorK 06:47, 10 August 2006 (CEST)

Hoi, the easiest way is to copy an existing table like the Neapolitan and do some "surgery" on it. GerardM 07:02, 10 August 2006 (CEST)

What do I do now?[edit]

Hi. I just created my account and would like to start contributing. The question is what is the format for definitions, etc. and page titles? Do you have a FAQ or something? Cheers, Malafaya 16:27, 10 August 2006 (CEST)

Hi, great to see you here :-) I just put you a note on your user page. If this does not help, please tell me. --Sabine 16:28, 10 August 2006 (CEST)

Alternative definitions[edit]

There is currently in place basic support for adding rewrites of the Definitions intended for other audiences, such as the scientific community, perhaps children or second language learners. We do not really know what we can use this for yet.

I was wondering if the use of this feature could possibly be even further enhanced by adding a tag to the respective Definition stating what its intended audience is. A user could then in his preferences check which audience he considers himself to be.

This of course also helps us to clarify what the intended audience for the primary Definition is; my short version would be: An adult, native speaker of the Language, with no special knowledge of the field in question.

My thoughts on this are far too vague for Functionality wanted .. yet, some comments would be very appreciated. *smile* --Sannab 12:51, 11 August 2006 (CEST)

I think it is the way we should go. Like you I am thinking on how this needs to be done.. GerardM 16:26, 11 August 2006 (CEST)
"An adult, native speaker of the language, with no special knowledge of the field in question". I would agree with that. We also need alternative definitions for legal terms, for example murder has a very specific legal definition which may also vary from country to country. --Tosca 03:04, 12 August 2006 (CEST)
What you say is that we can have several kinds of definition ? Something like :
  1. (Simple) A social insect, domesticated and kept by humans for the creation of beeswax and honey.
  2. (Scientific) An insect, hymenopteran, from superfamily Apoidea, Apis mellifera ; domesticated by humans for the honey and the beeswax it produces.
? The first is enough for everyone to understand what it is about, but the second is more precise and useful for someone who works on the subject. - Dakdada (discuter-talk) 17:06, 12 August 2006 (CEST)

When we import an ontology or a thesaurus, we often get a definition for something that is already defined. Where we do not need an additional DM and where we would consequently remove the Definition when we do not have room to store an alternate Defintion. The way I understand things, we would end up with the "operational definitions" for a language and all the rest either is an alternate definition or is a DM where the "Identical Meaning" flag is turned off. GerardM 17:57, 12 August 2006 (CEST)

For terms from certain domains, where citable agreed-upon definitions exist, such as eng:murder (legal) or lat:ulcus molle (medical) or lat:apis mellifera (biological taxonomy), it might be feasible to keep true references as (alternate) DMs such as 'Section 815 civil code of England of 1812' or '1st described in "journal of applied medicine", 1867 XIV/13 pp.256-259, July 1867' or 'Linné 1678' etc. - Of course the definitions of murder in various legal systems may become numerous Smiley.png. -- Purodha Blissenbach 14:16, 18 August 2006 (CEST)


Is there a way to know how many words are translated into a specific language and how many definitions/percentage have their translation into that language? Malafaya 02:29, 12 August 2006 (CEST)

At this moment there is no way to know how many words are present in a certain language: From Google I compiled this list:
German 39.300
English 18.100
Dutch 11.800
Greek 929
French 875
Italian 855
Polish 854
Spanish 800
Swedish 689
etc. etc.
Hebrew 81
This list is not complete. I find it strange that there are more German expressions than English. HenkvD 14:50, 12 August 2006 (CEST)

Interface translations and double licensing[edit]

I'm not sure where the interface can be translated. Currently the french interface still mentions a single GFDL licensing under the edit box of a "normal" (ie not DM) page. Ce`dric 18:55, 12 August 2006 (CEST)

marsh, bog, wetland, swamp, fen... professional definitions / everyday ones?[edit]

This is very long, too long. Not sure if I am making a good reasoning here or mainly a rant, but please bear with me since I do believe I have a point in the end. It all started with the page / article Expression:marsh which, when I started looking into it, had once Swedish Expression added to it: "våtland".

Besides "våtland", which is a very wide concept, there are several Swedish synonyms: "myr", "kärr", "sumpmark", "moras", "träsk", "mosse". You seldom hear those defined; I guess the words originally were defined as "wet mossy land that looks in a specific way", then came the scientific people and found out why they looked that way, what was the specifics that gave them the special vegetation etc. At least from an environmental point of view, "våtmark" is the preferred term so you don't have to be specific... and average people hardly have a clue about the difference. To them, it is all synonyms. It is a very common mistake to believe that the definitions of the scientists and experts is right, and those of normal people are wrong... but remember, the definitions of ordinary people came first!

When i check the words for these wet mossy lands in my sv-en and en-sv translation dictionary, it mentions tons of synonyms, in both directions - and there is no real possibility to understand the difference between them. I guess that is because most of the times, the words are used in the imprecise way of normal people so the translation dictionary mainly - but not only - imply the average meaning, not the professional (biologist/geologist) meaning of the term.

These are the English DefinedMeanings, consisting of Expression and Definition, here in OmegaWiki.

  • marsh - An periodically inundated area of low ground having shrubs and trees, with or without the formation of peat.
  • bog - A commonly used term in Scotland and Ireland for a stretch waterlogged, spongy ground, chiefly composed of decaying vegetable matter, especially of rushes, cotton grass, and sphagnum moss.
  • wetland - Areas that are inundated by surface or ground water with frequency sufficient to support a prevalence of vegetative or aquatic life that requires saturated or seasonally saturated soil conditions for growth or reproduction.
  • swamp - A permanently waterlogged area in which there is often associated tree growth, e.g. mangroves in hot climates.
  • fen - Waterlogged, spongy ground containing alkaline decaying vegetation, characterized by reeds, that may develop into peat. It sometimes occurs in the sinkholes of karst region.

What do they look to native English speakers? Is this how you would define these terms, or at least agree that these are reasonable definitions to distinguish between marshes, bogs, wetlands, swamps and fens? If so, what are the chances that the Expressions in the other languages (and those are long long lists!) will match these definitions? What if the different words - in common man's mouth - are defined according to the kind of wet mossy lands you have in the countries where the language is spoken?

The Swedish words, according to Nationalencyklopedin (the major modern encyclopeedia, with accompanying dictionary)

  • "myr" is wet land where peat is created. (Peat is basically old bog moss, sphagnum to talk latin, which forms peat mainly since there is not enough oxygen for it to rot. Hope that makes it clearer to someone.)
  • "kärr" means something like a "myr" where the water runs to it from the land around it.
  • "mosse" is a "myr" where the water comes only from the rain.
  • "träsk" is wet mossy land, that is difficult to walk through since it might have open water at places and also is loose so you sink into the moss.
  • "sumpmark" and "moras" do not have articles in the encyclopaedia, only in the dictionary, and is defined as "wet and mossy land" and "an area similar to 'träsk'" respectively.

In OmegaWiki, all the English terms have long lists of matching expressions in different languages. Do they come from the wiktionaries? It is very often that a word has one meaning to professionals, and another to ordinary people. My theory is the situation is like this: we are taking the Expressions from common people's vocabulary, mostly - the ones you find in normal translation dictionaries - while the Definitions come from professional people's definitions. If so, we will naturally have some huge discrepancies and confusion.

(Then of course is the problem what to do with the huge lists of Expressions in different languages regarding all these wet mossy lands. That is minor compared to the overall thing, but in this case I'd suggest delete it all and start all over - all the way from the DefinedMeaning. I can not see a way to work this out, and the more translations gets added the more difficult it will become. That is not my main point, however.)

// habj 22:44, 12 August 2006 (CEST)

Example: find all German words with missing Italian translation[edit]

Yesterday Gerard pasted me a link to find all German words ... well I played a bit around and now I have a link that finds you all German words without Italian translation. In that way you can easily find words where you want to work on. Just click on the following link and adapt the search to your necessities:

Have fun :-) --Sabine 14:55, 13 August 2006 (CEST)

    • I'd really like Google to update more quickly... Running out of not already added Dutch DM translations and syntranses... Siebrand 03:07, 16 August 2006 (CEST)
      • Do you know Google Sitemaps? I'll look into integrating support for it into MediaWiki and Wikidata in the near future. -- Purodha Blissenbach 14:30, 18 August 2006 (CEST)

Double words? Capital?[edit]

Hi, I'm translating some of the entries into Spanish, and I came across some words that are entered both in small letters and with initial capital, like:

(just to clarify: it's not my intention at all to enter in the discussion about Expressions with initial capital or not...) : )

Shouldn't there be some "coherence" there? I would propose:

  • to use capitals for the entries in singular, since they are part of this "meta-expressions" (so called?) or "meta-language" (?), that is used in the OmegaWiki to name some very specific things (and with the specific meaning stated here)... almost as proper nouns (like: Definition, Expression, DefinedMeaning... then Language, Script, Semantic Drift...). Therefore, to rename "script" as "Script" (and the same with similar cases).
  • to merge the contents of "Language" and "language" into the first one (and the same with similar cases).
  • maybe to change the name of the entry for the plurals ("Languages", "Scripts") into something like "list of languages", "list of scripts" (maybe with small letters since it's more a general reference to those terms and a list of them... like say, "Bible" / "list of bibles" or "Government" / "list of governments"...).

What do you think??? --Enboifre 19:07, 15 August 2006 (CEST)

Well, maybe it would make indeed sense to merge the contents of Language and language and then create a redirect - as for Languages that is a list of language codes with the language names you then find in OmegaWiki with their translations ... mabe it should be led to "list of languages" ... another possibility would be disambiguation. Let's see what the others think :-) --Sabine 16:13, 16 August 2006 (CEST)

User level 5[edit]

Well, Level-5 has nothing to do with proficiency, but with usage of the language for work. Now this can be misunderstood looking at level 1-4 and there fore I would propose to change level 5 into level "P" in order to distinguish better between proficiency levels and just an indication on where one uses the language. Besides that I would like to propose to adopt the translator's templates like used on Meta - in this way it becomes easier to find people that can help with communication and translations here on one hand and on the oter: we can link the categories with meta and other projects - so if we should not be able to find the right person here we might be able to find him/her on one of the other projects. Thanks for your thoughts! --Sabine 14:11, 16 August 2006 (CEST)

I agree on both points. Laszlo 16:40, 16 August 2006 (CEST)


I noticed you don't have a favicon. If you need one, I will be happy to give you one.--Mac Lover Talk 20:08, 16 August 2006 (CEST)

Could you define Expression:favicon ? Kipcool 09:51, 17 August 2006 (CEST)

French Swiss[edit]

Today, I wanted to add a French Swiss expression. Should it be added to "French", or should we add a "French (Switzerland)" language, like the "English (United States)" language? If the second solution is right, should I fill a bug on bugzilla to ask for "French (Swiss)" and "French (Canada)"? Thanks Kipcool 09:51, 17 August 2006 (CEST)

At this moment we only have "French". For English we have English and English (American). Our practice is that we would use the US version when it is distinctly US. In this way we should also have an UK version of English. So your suggestion is approproate.. :) GerardM 10:35, 17 August 2006 (CEST)
I do not really agree. I did not mind about English but (given it is possible that I do not understand well the role of languages in the software) I think that we should not vary from the ethnologue list of languages unless it is really important and after some discussion. luna 03:27, 19 August 2006 (CEST)
Ethnologue is nice to have but they have about half of the languages they should list, and they hardly recognize variants beyond the promient ones of ISO 639-1 which means their data is not suited for fine granularities. A good translational dictionary needs these fine grain stuff desperately. So we must have it at least at some time in the future. Why not now, then? Or as soon as (valid) demand is visible ? -- Purodha Blissenbach 17:21, 19 August 2006 (CEST)
From a translator's point of view making these differences is very important. People ask for Swiss/Austrian/German when they need to localise a text - I mean really localise and not just translate in order to have it understood. It is absolutely needed for the official languages and it is very important also for many regional languages. Often we have one language code for a whole group of languages and that definitely is not enough - this is the reason why we build the swadesh lists - there some differences can be seen. Example: German:Tomate = Austrian:Paradeis - German:Aprikose = Austrian:Marille. --Sabine 17:33, 19 August 2006 (CEST)
And another couple of dozens more (so 'Austrian' is clearly more distinct from 'German' than Hindi from Urdu Smiley.png) There is another list type, the Wenker List (initially designed and used by Georg Wenker since 1876, several enhancements til after WWII) which as well as Swadesh Lists can be used to correlate words. Wenker used more & different words and put them into sentences, so that certain basic 'which word goes with which' questions are answered, and basic grammar and word order issues are documented as well. There are huge collections of Wenker Lists from all of Northern, Western, and South Western Germany and about at least, likely many more which I'm not so much aware of. When available, I am going to get or locate electronic versions of them.
I know that there is a magnitude of (at least) some 6000 distinct dialectal variations of German auxiliary verbs 'sein' and 'werden' alone in N+W Germany (to be, and passive to be/get/become)
There are tons of peculiarities to know with local German languages. While these do not show up so often in standard translations, they play a role e.g. in advertizing. Advertizing is a regional business in the German speaking areas to quite some extent and uses nonstandard or 'wrong' German all the time, which needs to be regionally targettet. Else you might experience flops like the German outdoors supplies company having their combined back pack/sleeping bag proudly advertized as 'body bag' in the US. --- Purodha Blissenbach 10:31, 19 August 2006 (CEST)
  • (Edit confict solved)
My concern with adding new languages is that we want to translate all the DefinedMeanings in all the languages. I do not know linguistics in any way but I think that some variants are so similar that we may not add them as languages in the software especially if there is only vocabulary differences between the two languages. I want differences underlined but I just wonder if there is no other way of seeing it, by the means of domains for example.
But perhaps I just do not understand what are exactly language variants of the current software ? luna 20:06, 19 August 2006 (CEST)
(It's not I *want* to stick to ISO 639-1, but I think that as we choosed this reference list we must do modifications with parcimony)
Domains of use is a good and useful concept, also regions of use, which could be likely handled similary. When it comes to what German politics calls dialects today, we're talking of different languages, having own grammars, own syntaxes, hundreds or thousends of own words, and/or similar words with deviating meanings and/or little overlap, etc. As far as dictionary is concerned, they need to be treated as own languages, with some specialized vocabularies (e.g. scientific terms) mostly borroughed from standard German (and often pronounced & written differently). As a rule of thumb, one can say that most regional/local languages of the Regions where some sort of German is spoken are incomprehensable to most Germans speakers. I'm not very knowledgeable about Romance language areas, yet assume a comparable situation there.
We are using ISO 639-3 (not -1) here. Yet it is obvious (and not disputed by ISO, btw.) that this by far not covering all languages yet. So, as need be, we'll have to find ways to overcome the (current) limitations. I would not assume it to be unlikely, that we might provide sufficient documentation for this and that language currently not in ISO 639-3, and thus be supporting their addition to the standard in the future. -- Purodha Blissenbach 21:03, 19 August 2006 (CEST)

domain of use[edit]

Related to the question about French Swiss : More, would it be possible (e.g. with the domains) to specify some geographical, historical, field or register precision on the usage of one word ? and how ? by the means of relations ? of domains ? luna 10:28, 17 August 2006 (CEST)

I think there will be a "domain" box where you can enter such information, someday... For what I've understood, it'll be different to the "relation" field, because it must be attached to "expression+definition" (unique to a given word with a given meaning), whereas the relations that we have for now, when added, are available for all synonyms and translations.
Then, I'm wondering if French Swiss should be a specific geographical use of the word, or just a second language, as is the English (USA) for now. Kipcool 12:03, 18 August 2006 (CEST)


Question: If I delete this page: Expression:thisisatest, will it delete the DM and the expression, or just the expression? (Then, the DM would be lost I guess). Kipcool 11:51, 18 August 2006 (CEST)

Deletion of DM's does not work yet .. you can remove all definitions from it and you can remove the Syntrans (but one). GerardM 12:34, 18 August 2006 (CEST)
Should we create a dummy SynTrans (f ex Expression:ToBeDeleted) and leave that as the only remaining SynTrans for such cases? --Sannab 12:51, 18 August 2006 (CEST)
Sannab: I agree, that's what I was testing with Expression:thisisatest
GerardM: so, if I click delete, the DM will be orphaned? If yes, I think it would be easy to run an sql command that removes orphaned DMs from time to time. Kipcool 13:22, 18 August 2006 (CEST)
Expression:ToBeDeleted created! Kipcool 22:02, 26 August 2006 (CEST)
Now, it has been deleted and can't be used anymore (database error) :-( Kipcool 21:30, 4 September 2006 (CEST)
I created Expression:tobedeleted to be used instead. Please DO NOT DELETE THIS EXPRESSION until the database bug is fixed. HenkvD 23:13, 4 September 2006 (CEST)

At this moment it is possible to delete a DM direct from the expression (without the workaround of Expression:tobedeleted). This is possible because the expression itself is listed in the translations. This is probably because it is also added on the history. Bug or feature? I think it is very usefull. HenkvD 21:41, 13 September 2006 (CEST)

I'm wondering if the DM is really deleted this way. I fear that it just becomes not linked to any expression, but still present in the database, somewhere (and unaccessible then). So, I'm still using Expression:tobedeleted for now... Kipcool 11:07, 14 September 2006 (CEST)

Balkan Languages Repository[edit]

There is a Balkan Languages Repository (Bulgarian, Greek, Romanian, Serbian, Turkish and Czech), see -- Purodha Blissenbach 14:55, 18 August 2006 (CEST)

Adverbial forms of adjectives[edit]

This came up after Celestianpower added profusely and associated DM. Since profusely is the adverbial form of the adjective profuse, and we don't do inflections yet, should "profuse" be a synonym at the existing DM? And what should we do with "profusely", leave it there / remove it / make new DM? László 17:06, 18 August 2006 (CEST)

I do not clearly see a reason, why adjectives and adverbial forms should not share a DM. Grammar alone is of course not a reason for another DM. -- Purodha Blissenbach 17:23, 19 August 2006 (CEST)
Hmm, but the definitions are different for an adjective and an adverb. Typically, profusely = "in a profuse manner", so it's not a synonym.
Also, I have some problems considering adverbs as inflexions (not all adverbs are derived from adjectives). Kipcool 19:47, 19 August 2006 (CEST)
I was uncertain wether or not "in … manner" would have to be there or not with adjectives. Yes, OK, if that's a conceptual difference, there has to be a different DM. If I could reword an adjectives DM into the "in … manner" I'd be lazily using only one DM Smiley.png Of course I don generally assume Adjective/Adverb to be simple inflection forms of each other - but that is me, an 'Indogermanic grammar victim', possibly an east asian, american indian, or polar speaker, who does not have POS concepts like we do, might view that differently. -- Purodha Blissenbach 21:12, 19 August 2006 (CEST)
Purodha, you don't have too look that far for different ways of handling adverbs. German, for example, a language you might be familiar with, does not have a strict adjective / adverb distinction, most adjectives can be used as adverbs without any inflections.
As for English, the best solution so far is to create two entries for verb and adverb, making clear in the definition that one describes a state of an object and the other a manner of doing something. The translations should be chosen accordingly to not mix this up.
Once OmegaWiki has some grammar abilities, pairs of adjective / adverb could be tied to each other. So far, they should be added as related terms. --Mkill 14:49, 20 August 2006 (CEST)
Agreed. -- Purodha Blissenbach 19:06, 20 August 2006 (CEST)

Hierarchy of defined meanings Adding expressions can lead to bad defined meanings[edit]

Among my first contributions I've added the Finnish expressions for eng:"he" and eng:"she". In both of cases the word is the same (fin:"hän") since the Finnish pronoun does not distinguish between genders. The Finnish expression now has two DM's associated to it, although I think it should only have one with a definition like "the previously mentioned person". This DM could then be related to the other two with the "broader terms" classification.

Is the model I suggested correct or is there some reason to leave the situation at the current state? What would happen if I'd delete the DM+fin:"hän"? --Mikalaari 19:12, 20 August 2006 (CEST)

In this situation there should be three DM's one for he, one for she and one for hän. The flag for identical meaning needs to be turned off in many cases. GerardM 19:25, 20 August 2006 (CEST)
Well, I'm not one of the experts here, but I think that the Finnish Expression should be linked with "he" and "she" PLUS add a new DM for the Finnish Expression (only) which would be something like "3rd person form of personal pronoum, which refers to a person (male or female) that has already been mentioned" (or whatever the equivalent would be for the Finnish concept of the Expression), then as Gerard says, the Finnish expression won't be an "identical meaning" for "he", nor for "she". I hope that helps also! :D --Enboifre 19:31, 20 August 2006 (CEST)
Thanks for both. In fact I already came to the same conclusion (all three DMs are needed) while exploring with another similar case. However it seems unnecessary to me to show all three DM's for the Finnish expression hän, since the most general (genderless) one would suffice since it's kind of superclass for the other, a broader term in other words. Probably it depends only from the way the interface is currently implemented and the underlying data should be good.
For now I've defined "hän" as "the previously mentioned persons", but maybe a more explanatory definition would make things clearer. --Mikalaari 21:07, 20 August 2006 (CEST)

The similar case I mentioned above is the expression he, which means "they (persons)" in Finnish. In this case "they" has a broader meaning as it refers also to other animals and things as well. In Finnish the non-human part is expressed with ne, so the three expressions (or DMs, really) could be written as a formula:

they = he + ne.

I've created the three DMs and marked all matches as non identical like GerardM suggested (I think). I've also used the relations "broader terms" and "narrower terms" to classify the DMs in respect to each other. This classification seems to have no effect on how the defined meanings are shown on their pages. I don't have time to study the relations in depth at the moment, but I wanted to tell what I've done in the hope that someone could point out possible errors I've made or give me hints how to proceed. --Mikalaari 21:07, 20 August 2006 (CEST)

I think it should be:
For Finnish hän:
  • Definition for "male" other (Eng. he) - non identical
  • Definition for "female" other (Eng. she) - non identical
  • Definition for other (maler or female) (Eng. he + Eng. she) - identical
For Finnish he:
  • Definition for other (Eng. they) - non identical
  • Definition for "person" other (Eng. they - Fin. ne) - identical
For Finnish ne:
  • Definition for other (Eng. they) - non identical
  • Definition for "animal" other (Eng. they - Fin. he) - identical
Does it make sense? : ) (I have a similar case with They (male AND female plural other) and Ellos/Ellas (male / female plural other) --Enboifre 02:08, 21 August 2006 (CEST)
I think I found a similar problem of multiple definitions:
Expression: you
Definition: The person addressed.
Spanish Translation: tú
Catalan Translation: tu
Expression: tú
Definition: The person addressed.
Definition: The person addressed as the subject.
English Translation: you
Catalan Translation: tu
Expression: tu
Definition: The person addressed.
Definition: The person addressed as the subject.
Definition: The person addressed as a complement after preposition. (not added yet)
English Translation: you
Spanish Translation: tú
I think that the Original Definition was the English one, which can be very general since YOU can work as a Subject (YOU do), and as an Object, after preposition or not (I see YOU / for YOU, to YOU). Then in Spanish (TÚ), it was "redefined" adding "as the subject" since TÚ can only be Subject. But in Catalan (TU), it can be a Subject and a Complement after preposition, so a new Definition needs to be added... Which finally makes the original one, in my oppinion, redundant. However, the general definition would still be good for YOU (not really necessary to narrow it, it does explain what it is, since YOU has a broad meaning and use)... if we didn't have three more definitions added (because of the translations):
*The person addressed.
*The person addressed as the subject.
*The person addressed as a complement after preposition.
* And we could add: The person addressed as an Object, with or without preposition. (this one not linked to TU nor TÚ)
I don't see how to "solve this" (if you agree that it's a problem), but just wanted to point it out. --Enboifre 05:58, 21 August 2006 (CEST)

Please consider that OmegaWiki is not feature complete - this means that at the moment we cannot link correctly from let's say Spanish or Finnish to an existing defined meaning in English. It is hard to explain this ... what you describe here as wrong is not wrong because some functionality is still missing. I only can assure you that in future relations will work differently and will avoid having to write several meanings for English if the additional ones are meant to be for other languages. --Sabine 09:47, 21 August 2006 (CEST)
In Armenian language, we have a similar problem. It's the word նա that means both 'he' and 'she', just letting you know that there are more examples of a genderless word. Togaed 10:12, 21 August 2006 (CEST)
Re: Definition: The person addressed as a complement after preposition. (not added yet) Do not consider grammar as a part of DM - all about grammatical or use context, does not go into a DM. It has to be entered elsewhere, once the required functionallity is there. It is fine to have different DMs for different uses or grammatical constructs, but it not (always) necessary from a technical viewpoint. -- Purodha Blissenbach 12:58, 21 August 2006 (CEST)

The thing that originally puzzled me was the DMs that don't completely describe a given expression associated to it. Or rather, not that they are associated to it, as its the very essence of this database model, but to see them on the page of that expression together with the DM that already describes the expression adequately.

In my opinion the narrower terms (or DMs) that have been appropriately related to a broader term, should not be shown (by default at least) if the broader term is also associated to the expression. So the things are working well how they are, although the user interface couls be changed.

What comes to the case pointed out by Enboifre it should work fine as well. After finding all necessary DMs for "you", "tu" and "tú", they should just be related to each other, if possible, with the narrower terms and broader terms classifications. With an improved interface they would then eventually be shown in a more natural way. --Mikalaari 20:50, 21 August 2006 (CEST)

problems with solutions? (Definitions)[edit]

Hi there! Maybe someone is already working on it, or keeping it in mind for future development, but I see a couple of problems, so I just point them out here:

  • you go to a word and you don't really see marked which the original definition is (the order seems to change once you save each definition... at least sometimes!).
  • if I'm not wrong, you don't see if the Expression on the screen is actually identical meaning or not, to each of the Different definitions of it.
  • I think it would be handy to have the synonyms and translations ordered alphabetically, it will be easier to look up, and you'll have together the different words for the same language.
  • This one I see it more complicate to deal with...
  • When I go to father, I get a definition saying: "A male parent".
  • I would have expected to find in mother: "A female parent"... but of course I get: "A woman who has at least one child".

In my oppinion there are two matters:

  1. Lack of consistency in the definitions of related terms (male - female, different persons of pronouns, etc.). I guess it's unavoidable to happen, since each person enters what they think it's the definition, without necessarily comparing the Definition with any other related terms.
  2. "Impossibility" to make changes. I would change the definition for mother as "A female parent", to make it consistent with the "male counterpart" AND because "a girl" can also be a mother... So if I was to make it "consistent" and make some (small or not small) changes in the definitions, that would cause a big problem, because I could of course ONLY change it for the language(s) I know. Of course the "solution" would be to enter a new Definition... and then we could get something like:
A woman who has at least one child
A female parent
A female person who has at least one child

But actually also a female animal can be called a "mother"... so in my oppinion it should be: father - male person or animal who has at least one child // mother - female person or animal who has at least one child (one more definition to be added then...). And then someone else comes up with yet another Definition for the same word or for a translation of it: person or animal (male or female) who has at least one child...; and yet another one: person only -not animal- who...; and yet another one: animal only -not person- who...; mammal who...; etc... (hypothetically, for some Expression in the different languages). Then the different translations are linked as identical AND not identical... So at the end you can have one Expression with lots of Definitions, which are not really helping to know what the word you are looking for actually means (well, that's what I think). And if you put them all, except for the original one, as Alternative Definitions, it could be that the original is not really a good definition (like "mother" as "a woman who...").

Ok, I'm not sure if what I just said is somehow clear...! :D --Enboifre 04:36, 21 August 2006 (CEST)

See above - these things have been considered but are not yet programmed. We are working with a subset of the final ERD. --Sabine 09:49, 21 August 2006 (CEST)
I agree that getting the definitions consistent will prove a hard thing to do. More important to me is the notion that we need "operational definitions"; definitions that are easy to identify a concept by when you get to see an expression in a corpus.
Sabine is right where she says that we do not have the full functionality yet. The fact that we do not have an indication what the DM is (expression and definition) does not help. Then again, I have seen some preliminary things to do with recent changes .. It is crucial that we get this done asap. GerardM 13:12, 21 August 2006 (CEST)

Translation pitfalls avoided[edit]

I know, I am lightages ahead of current functionality offerings or needs. Still want to let you know. Also, imho there is a good change that the problem class can be successfully addressed with specific relations. Examples:

  1. If you translate eng:'virginity' to deu:'Unschuld' and negate the word or the sentence, you may arrive at deu:'Schuld', literally, or by readers associative minds, which is eng:'guilt' and not at all related to eng:'non-virginity', so it would be either wrong or funny.
  2. Australian pop group AC/DC made an intentional word play with eng:'ball's being a social dancing events, something used to play with in sports, and parts of female bodies. In a serious translation, however, you would probably want to avoid the use of certain words, such as eng:'bouncing', in a context which makes meaning uncertain or slippery.
  3. If you pinch a pin cross a rope, cord or thread, eng:'thru the rope', translate it to ksh:'dorsch de Kood' you may find your audience burst in laughter since you accidentally hit an idiomatic expression, saying that a living being eng:'escaped, ran away, and got lost' from a hold or out of custody.
    • If you have a mouse that bites her way through a rope and subsequently escapes, that translates as ksh:'dorsch de Kood' twice. That is fine only if your text in not too serious.

Ways to deal with that:

  1. in an actual translation, as usual, check possible backwards translations with meanings deliberately bent away from the original;
  2. for known problem cases, create a relation from the expression to → the unwanted associative or semantic neighbor, → the different DM, → the idiomatic expression using it, etc. so as to support individual checks against possible pitfalls. Here, it might be nice to have links to good sample translations avoiding the problem.

-- Purodha Blissenbach 15:45, 21 August 2006 (CEST)

Followup: [1] tells of a possible application of the suggested relations within a language (brit. English in this instance) The project of the Universities of Dundee, Aberdeen and Edinburgh, among other things, generates puns from dictionary data. The text mentions Computer scientist Dr Annalu Waller of the University of Dundee as one of the project researchers. -- Purodha Blissenbach 11:48, 23 August 2006 (CEST)

GEMET expressions without definitions[edit]

See here a list of 154 expressions imported from GEMET that have no definition. See for example sanction on GEMET. Feel free to add your own definition. HenkvD 13:55, 22 August 2006 (CEST)

The same in other languages like German (350 DM's found with Google). Furthermore I found a few standard definitions originating from the GEMET database:
Does anybody know other standard GEMET definitions like this? HenkvD 23:17, 22 August 2006 (CEST)
I found a few that just say NULL -- 07:28, 24 August 2006 (CEST)
When there is a definition and there is a translation that says "NULL" you can remove that "translation". GerardM 08:02, 24 August 2006 (CEST)
I replaced most of the missing defintions by <Definition needed.>, and removed NULL, No defintion needed, blanks etc. There are probably some more that Google cannot find at this moment. I will check those again in a week or so. HenkvD 14:47, 28 August 2006 (CEST)

Automated transliteration[edit]

Just had a thought: what if we had automatic transliteration functionality? A lot of work has been done in this area, and it shouldn't be too hard to implement. What this would mean, is that you could have an arbitrary word written in an arbitrary writing system represented in another, arbitrary, writing system. For example, I understand the Latin and Greek alphabets, but Arabic is like Chinese to me :) With an automated transliteration system, I could see how to pronounce the words, even if it's only an approximation. Of course, this shouldn't be limited to transliteration into Latin; it would be interesting to have Japanese-to-Arameic transliterations.

If there is any interest in this, I'm willing to do some further research into the subject. Thanks for reading! László 16:27, 25 August 2006 (CEST)

I take it you mean transliteration into IPA. Because any other transliteration does NOT really help you pronounce things. GerardM 08:47, 27 August 2006 (CEST)
Good point. However, if we would have transliteration INTO IPA, it would be an interesting step to have transliteration FROM IPA. But I think tranliteration into IPA would be a good idea. You can hardly expect everyone who adds a translation or synonym to also (manually add the pronunciation in IPA; however, you could provide a generated IPA pronunciation, which could be reviewed and possibly corrected before submitting it. László 02:08, 3 September 2006 (CEST)
Transliteration only replaces one grapheme with another, it has nothing to do with the indication, how to pronounce the transliterated word. That is done by transcription, which is what IPA is for. --AtonX 15:33, 16 October 2006 (CEST)

Some more languages[edit]

I have asked Erik to add a few more languages; Serbian (both Cyrillic and Latin script) Thai and Ido. We now have some more languages that we can support.. We have 485 users at the moment .. No idea how many articles in the Expression namespace. GerardM 08:45, 27 August 2006 (CEST)

Special:Statistics finds 6.160 articles, Google [2] finds at least 42.300 articles in the Expression namespace, of which at least 4680 in one language [3]. But none of these figures are 100% acurate. HenkvD 14:52, 27 August 2006 (CEST)
Please ask him to add Kölsch, too. Since is fully back online, the UI data should be available quickly, soon. -- Purodha Blissenbach 15:23, 27 August 2006 (CEST)
Can we have Latin, too ? -- Purodha Blissenbach 04:28, 28 August 2006 (CEST)
Some numbers have been entered, such as 6 or XIV, in several languages. I suggest not to do that, and rather define an own (private) 'language', such as "Natural Numbers" or the like, with language code qnn-AL ('arabic latin' , i.e. decimal notation in modern latin script numerals), or qnn-RN (Roman Numerals), etc. for those. -- Purodha Blissenbach 04:28, 28 August 2006 (CEST)

Languages spoken in a country[edit]

On the portal for Australia I have changed the way the information about the languages that are/were spoken. I think this is a nice way of having this information for other countries as well. I have used the information of Ethnologue for this.. What can be added are the languages from the immigrant communities. GerardM 15:16, 28 August 2006 (CEST)

Synonyms and hyponyms[edit]

We are still learning how to write good Definitions, and I think we have a way to go. For now, I would like to propose a guideline for adding Expressions to the SynTrans table:

Add the broadest Expression from your language that fits the Definition. If your language contains hyponyms that also fit the Definition, do not add them to the SynTrans table. They are hyponyms, and should be related to the Definition as such with help of the Relation's section.

The Definition should then be marked as needing further specification, possibly with an Attention-template. --Sannab 17:48, 28 August 2006 (CEST)

new collection[edit]

There is a new collection for the ISO-639-3 codes. These are the codes that we use with languages and they will not be inserted when we import the ISO-639-3 languages for the languages we already have. It is therefore appreciated when you add the code with the collection.

One of the functions that can be expected with collections is a browser function for entries associated with a collection. GerardM 09:41, 29 August 2006 (CEST)

I was confused on how to enter this, and consulted Gerard on IRC. He means to add the language name Expression to the collection for ISO-639-3 and add the language code as 'source identifier'. --Sannab 11:55, 29 August 2006 (CEST)
Can you give an example to clarify this further? HenkvD 12:04, 29 August 2006 (CEST)
Asturian Expand the Collection section.--Sannab 12:17, 29 August 2006 (CEST)
Thanks. HenkvD 12:20, 29 August 2006 (CEST)