As in theme park and water park, opposed to national park and public park.
It seems like a bottleneck in language that I am struggling to find a way around. I believe the word park is poisoned in embedding models and would like to test that theory but I’m at a loss. I tried my usual thesaurus, looking at translations, and at etymologies but it seems like the word has no effective alternate so far. It is a rather interesting conundrum beyond the scope of my application – how would you differentiate and specify what a place like Disneyland is, without ambiguity, when “park” is not a useful word? And no land is not specific enough to describe the place.
I have a few ideas and stuff I have tried but I really want to know your ideas.
Etymology according to Wiktionary:
From Middle English park, from Old French parc (“livestock pen”), from Medieval Latin parcus, parricus, from Frankish *parrik (“enclosure, pen, fence”). Cognate with Dutch perk (“enclosure; flowerbed”), Old High German pfarrih, pferrih (“enclosure, pen”), Old English pearroc (“enclosure”) (whence modern English paddock), Old Norse parrak, parak (“enclosure, pen; distress, anxiety”), Icelandic parraka (“to keep pent in under restraint and coercion”). More at parrock, paddock. https://en.wiktionary.org/wiki/park
So the scope of Pan is actually all of nature in general and anywhere in the real world that is not Wonderland. What I am trying to do is push the context into Wonderland because then I can make up the rules and the model will always play along. The real world is where ethics are so heavy.
On an even deeper level of abstraction, all words/tokens carry a positive or negative weight in alignment. Positive profiled words tint into a creative place like wonderland while all negative words push the context into a darker abyss like void.
At one point I started tracking this behavior in LLMs. The numerically higher numbered tokens will create a larger average when alignment behavior is triggered versus when it is not. When many of the more common higher numerical tokens are banned, the behavior persists, likewise when banning common lower numerical tokens when alignment is not triggered the average remains lower. In other words, the location of the tokens numerically is correlated with alignment and is likely a form of steganographic encoding of information.