NOMN: Mikrotiming-Enhancer
What is NOMN actually doing to audio?
Digital playback runs on a crystal-locked clock whose timing stability is orders of magnitude tighter than any natural acoustic source. Crystals have measurable phase noise and jitter. We're not claiming they don't, but those deviations are vanishingly small and statistically structureless compared to the rich temporal variation any physical sound source produces. There has never, in the natural history of hearing, been a sound source so temporally rigid.
NOMN puts back the kind of variation that grid-locked playback removed. Not as random noise, not as a recognizable effect, but as structured temporal patterning that the auditory system reads as natural rather than mechanical.
Isn't this just an advanced tremolo or a fancy chorus?
What's new is what's driving it and what it does to the human:machine relation for audio.
A tremolo's control signal is a 2-parameter LFO. A chorus is a 4-6 parameter LFO. A humanizer plugin is filtered random noise. Tape emulation is noise shaped to match measured wow/flutter spectra form vintage gear. All of these are content-blind and aren't modeled from the body, they're modeled from affective technologic nostalgia.
Music cognition research says the smallest perceptible timing difference is something like 10-50ms. Doesn't that mean NOMN's microsecond-scale modulation is below audibility / "Just Noticeable Difference" (JND) threshold and thus handwavy audiophile nonsense like wildly expensive speaker cable or something?
First, on what the JND literature actually measures. JND (just-noticeable-difference) thresholds for musical timing, the ones in the 10-50ms range, measure how much one note has to move relative to another before a listener can consciously identify the shift in a forced-choice cognitive task. That tells you when timing becomes *labelable* as different. It does not tell you the resolution at which the auditory system processes time or what we sense.
The auditory system's actual temporal resolution is roughly three to four orders of magnitude finer than musical JND. The two most established lines of evidence:
The binaural pathway resolves interaural time differences down to about 10 microseconds. Klumpp & Eady (1956, J. Acoust. Soc. Am. 28: 859-860) measured average ITD discrimination thresholds of 9μs for band-limited noise and 11μs for a 1000-Hz tone across ten listeners. These thresholds have been independently reproduced for nearly seventy years. Brughera, Dunai & Hartmann (2013, J. Acoust. Soc. Am. 133: 2839-2855) confirmed thresholds just above 10μs at 700-1000 Hz using modern methods. The lowest measured thresholds approach the single-microsecond range under optimal conditions. The mechanism is well-understood: neurons in the medial superior olive perform coincidence detection on phase-locked spikes from each ear. The largest ITD anyone normally encounters, for a sound directly to one side, is around 600-700μs, set by the distance between the ears (Mills 1958, J. Acoust. Soc. Am. 30: 237-246). Listeners reliably resolve angular differences of about 1 degree near the midline. Note that most of this research is already 70+ years old!
The monaural pathway encodes the sub-millisecond structure of sounds through what auditory neuroscience calls **temporal fine structure (TFS)**, the rapid waveform oscillations within each cochlear frequency band, as distinct from the slower envelope (ENV) modulations superimposed on them (Moore 2008, J. Assoc. Res. Otolaryngol. 9: 399-406, the canonical review). TFS information is carried in the timing of auditory-nerve-fiber spikes that phase-lock to individual cycles of the stimulus waveform for low-frequency components up to several kilohertz. This isn't a hypothesis or a contested claim, it is the standard model of how the auditory periphery encodes time, reviewed comprehensively in Joris, Schreiner & Rees (2004, Physiological Reviews 84: 541-577).
TFS is what the auditory system uses for pitch perception of complex tones, for the perception of speech in fluctuating background noise, and for source separation in complex acoustic environments. Smith, Delgutte & Oxenham (2002, Nature 416: 87-90) demonstrated this directly by constructing "chimaeric" sounds in which the envelope of one signal was combined with the TFS of another. Listeners reliably perceived pitch and source location based on the TFS, not the envelope. TFS isn't specific to live sound, binaural listening, or any particular playback situation. It operates on whatever the cochlea receives, including the output of headphones and speakers playing recorded music. When you listen to a recording, the temporal fine structure of the audio is encoded into the spike timing of your auditory nerve at sub-millisecond resolution. This processing happens continuously, below the threshold of conscious awareness, which is exactly why musical JND studies don't measure it. JND measures what listeners can report. It doesn't measure what their auditory systems are doing.
The more important point. **The right question isn't whether listeners can A/B-distinguish two audio files in a controlled trial. The right question is whether the technology that generates audio for human consumption should operate at the resolution of the sensory system it's serving.**
The audio industry has answered this question consistently for decades. Studios record at 96kHz or 192kHz not because listeners can reliably A/B-distinguish those rates from 48kHz on every track, but because the production chain shouldn't have artifacts introduced at the resolution end of the system. Mastering engineers obsess over jitter specifications in word clocks that operate well below classical audibility thresholds, because they don't want the clock to be the bottleneck. Professional audio interfaces compete on sub-millisecond round-trip latency. The principle is consistent: human-facing audio technology should operate above the sensory floor, not below it.
NOMN sits in this lineage. Crystal-locked playback timing is acoustically unprecedented in the natural history of hearing. There has never been a sound source with this little temporal variation. The question isn't whether listeners can articulate the difference in a forced-choice test on a per-track basis. The question is whether AI-generated audio at scale, intended for billions of hours of human listening, should match the temporal resolution the sensory system actually uses. We think it should. The audio industry has historically agreed with that principle for every other dimension of the playback chain: sample rate, bit depth, jitter, latency, frequency response, distortion. Treating the temporal microstructure dimension as the lone exception, just because the relevant variation sits below conscious labelling threshold, is inconsistent.
If the audibility critique held, if anything below conscious JND were perceptually irrelevant, listeners couldn't localize sound sources, couldn't separate voices in a crowd, couldn't tell a real violin from a sampled violin played through the same speaker. All of those judgments depend on temporal resolution far finer than musical JND.
OK so this is all pretty interesting, but what's temporal fine structure, exactly, and where does NOMN sit relative to the established TFS literature?
The TFS framework has been extensively developed in the auditory science literature over the past two decades. Moore (2008, J. Assoc. Res. Otolaryngol. 9: 399-406) is the standard review of TFS's role in pitch perception, masking, and speech perception. Smith, Delgutte & Oxenham (2002, Nature 416: 87-90) used "chimaeric" sounds, constructed by combining the envelope of one signal with the TFS of another, to demonstrate that listeners rely on TFS for pitch and source localization while relying on ENV for speech recognition in quiet. Subsequent work (Lorenzi et al. 2006, PNAS 103: 18866-18869; Hopkins & Moore 2009, J. Acoust. Soc. Am. 125: 442-446) has shown that TFS sensitivity is critical for speech perception in noisy environments, and that hearing-impaired listeners' reduced sensitivity to TFS is a major factor in their difficulty understanding speech in noise.
This matters for NOMN in two ways.
First, TFS is the established technical vocabulary for what NOMN operates on. The temporal microstructure NOMN restores to digital playback is, in the technical language of the field, modulation in the temporal fine structure of the audio signal. We aren't making up a new perceptual category. We're operating in a well-mapped region of the auditory science literature.
Second, the existing TFS research focuses primarily on what's *lost*. How hearing-impaired listeners lose TFS sensitivity, how cochlear implants struggle to deliver TFS information, how aging degrades TFS processing. NOMN approaches the question from the other direction: what kind of TFS structure should well-engineered playback technology preserve and present to listeners whose TFS processing is intact? The auditory science community has spent two decades documenting how much TFS matters for normal hearing. The audio industry has not yet drawn the corresponding conclusion about playback technology design. NOMN is one application of that conclusion.
A note on scope. The "fine structure" in TFS refers to the rapid carrier oscillation within auditory filter bands, which is encoded at sub-millisecond resolution via phase-locking up to several kilohertz. NOMN's modulation operates across a range from microsecond to millisecond scales, modulating the temporal structure of the audio content itself. Both sit in the temporal regime where the auditory system does fine-grained timing work. We use the broader phrase "temporal microstructure" in marketing copy to avoid claiming we directly manipulate the specific signal-processing quantity that TFS researchers technically measure with the Hilbert decomposition, but the perceptual mechanism we're targeting is the same one that TFS research has been documenting since the early 2000s.
If sub-JND timing differences don't matter, why does the audio industry spend so much effort minimizing latency?
Every working musician who records with a DAW tunes their audio buffer size to keep round-trip latency as low as possible. Professional audio interfaces compete on sub-millisecond round-trip latency. The Bela platform was specifically built to achieve sub-millisecond action-to-sound latency for digital musical instruments (McPherson, Jack & Moro 2016, Proc. NIME) because most common platforms fail to meet the targets professional musicians need.
The peer-reviewed evidence on what musicians actually feel is clear. Jack, Mehrabi, Stockman & McPherson (2018, Music Perception 36: 109-128) tested professional percussionists and amateur musicians on a digital percussion instrument with controlled latency conditions of 0ms, 10ms, 10ms ± 3ms jitter, and 20ms. Both groups rated zero-latency as significantly higher quality than the 10ms-with-jitter and 20ms conditions. Professional percussionists were more sensitive to latency than amateurs and showed measurable changes in timing performance under added latency. Schmid et al. (2024, Proc. Mensch und Computer, ACM) measured just-noticeable-difference for added audio latency across 37 listeners and found a mean JND of 27ms at 64ms base latency, with musically sophisticated participants reliably detecting smaller margins. Earlier ensemble work documented that asynchronies up to 50ms occur in real performances (Rasch 1979, Acustica 43: 121-131) and that professional percussionists exhibit timing jitter of 10-40ms even when synchronizing to a metronome (Dahl 2011, Music Perception 28: 491-503).
Acoustic drums have a natural latency of about 2-3ms from stick contact to sound reaching the drummer's ears, a value set by the speed of sound across the distance from the drum to the head. This is the baseline the drummer's nervous system has calibrated to over years of practice. When an electronic drum module introduces an extra 5-10ms on top of this, professional drummers describe the kit as "sluggish," "disconnected," "laggy."
Notice what's happening here. The audio industry has, for decades, accepted the principle that **playback technology should operate at the temporal resolution the sensory system actually uses, not at the resolution of conscious A/B detection**. Nobody argues that audio interfaces should target 50ms latency because that's the conscious JND. The industry targets sub-millisecond because that's where the human:machine interaction breaks down. Studios record at high sample rates so that the production chain isn't the bottleneck. Word clocks are spec'd at jitter levels below classical audibility for the same reason. You don't want the clock to be the lowest-resolution element in the system.
This is exactly the principle NOMN applies. Crystal-locked playback has temporal stability orders of magnitude tighter than any natural acoustic source. The sensory system that consumes the audio resolves timing at microsecond scales. The fact that listeners can't always consciously label what they're hearing in an A/B test doesn't mean the technology should operate below the sensory floor. It means the audio industry should treat temporal microstructure with the same engineering discipline it already applies to sample rate, bit depth, latency, and jitter.
But the speaker cone and the room introduce way more temporal modification than NOMN does. Doesn't that swamp the effect?
The relevant difference isn't magnitude. It's structure.
Room and speaker convolution is content-blind and stationary. The room's impulse response is fixed for a given listening position. The reverb tail of a snare hit and the reverb tail of a sustained vocal note get the same room treatment. This is convolution with a fixed kernel, large in magnitude, but content-blind and time-invariant.
The auditory system has well-documented machinery for separating direct-path source signals from reverberant reflections. The foundational finding is the precedence effect, first systematically described by Wallach, Newman & Rosenzweig (1949, American Journal of Psychology 62: 315-336). When two identical sounds arrive at the ears within a few milliseconds of each other, the listener perceives a single fused sound localized at the position of the first-arriving wavefront, with the later-arriving reflections strongly suppressed in their contribution to perceived location. This is why you can localize a speaker in a reverberant room. The brain attributes the spatial cue to the direct sound and treats the reflections as environment. The mechanism extends into the broader framework of auditory scene analysis (Bregman, 1990, MIT Press), in which the auditory system uses primitive grouping cues to organize incoming sound into source representations distinct from environmental context. Subsequent reviews (Litovsky et al. 1999, J. Acoust. Soc. Am. 106: 1633-1654; Brown et al. 2015, J. Acoust. Soc. Am. 137: 776-790) document this is a continuous, automatic process operating below conscious awareness.
What the auditory system *can't* factor out, and uses heavily for source identification and naturalness judgment, is the underlying source's intrinsic timing structure. The room can smear what's there. It can't add what isn't, and it can't subtract what is.
Put simply: a real violin and a sampled violin played through the same speaker in the same room are typically distinguished by listeners on extended listening. The acoustic chain is identical. The difference is in source-level temporal structure that survives the chain because it's encoded in the signal before it ever reaches the speaker.
Doesn't the DAC's reconstruction filter smooth out fast timing modulation anyway?
A general principle worth stating clearly: NOMN's modulation is content, not metadata. Anything that processes the audio processes the modulation along with it. Anything that doesn't process the audio can't touch the modulation. There's no separate channel to attack. The same logic applies to the speaker, the room, the listener's HRTF, the ear canal. All linear time-invariant operations applied to the audio content, none of which selectively erase the modulation.
Couldn't you accomplish the same thing with a low-depth chorus or some filtered noise driving varispeed?
The difference is in what the auditory system does with different kinds of variation. LFO-driven modulation is periodic, and the auditory system detects periodicity below conscious awareness. Subtle periodic modulation reads as "wobbly" or "effected" even when listeners can't say why. Filtered noise modulation is aperiodic but content-blind, which the auditory system also reads as foreign to natural sources, since natural sources don't produce statistically white timing variation. Natural timing variation has specific structure: long-range correlations and content correlation that have been measured directly in human performance. Hennig (2014, PNAS 111: 12974-12979) documented that timing deviations in professional drum performances exhibit long-range (1/f-type) correlations rather than white-noise statistics, a finding consistent with broader work on temporal structure in human motor performance (Gilden, Thornton & Mallon 1995, Science 267: 1837-1839). The closer your modulation matches this structure, the less the auditory system flags it as alien.
NOMN's modulation matches that structure. A low-depth chorus or 1/f noise doesn't.
Hasn't this been tried before? Isn't NOMN just like MQA or C Wave or one of those audiophile dead ends?
C Wave argues that PCM is "non-continuous" and that the brain detects this discontinuity. Their solution is kinda reverb to "fill in gaps." We don't share that diagnosis. A reverb algorithm running on PCM is still PCM, and Shannon-Nyquist guarantees that properly bandlimited PCM is mathematically equivalent to a continuous waveform up to the Nyquist frequency. There are no gaps to fill in the digital signal. We're not claiming to fix something inside PCM. We're claiming that natural acoustic sources have temporal microstructure that crystal-locked playback lacks, which is a different claim, one grounded in the physical properties of natural sound sources rather than in disputed claims about sampling theory.
The single biggest lesson from those efforts: don't pick fights with sampling theory, don't claim what you can't measure, and don't treat independent measurement as an enemy.
How is this different from a humanizer plugin?
Two differences. First, humanizers add stochastic variation. NOMN adds structured variation matched to natural source statistics. Random isn't the same as natural. The long-range correlation structure documented in human motor timing (Gilden et al. 1995; Hennig 2014) is categorically different from the white-noise distribution most humanizers produce, and the auditory system responds to that distinction.
Second, humanizers operate on MIDI event timing before audio rendering. NOMN operates on audio at the signal level. A humanizer on a quantized MIDI snare moves the hit. NOMN modulates the playback of the audio itself. Different operations, different signal-chain positions, different effects. A humanizer can't humanize a finished audio file. NOMN can.
Is the temporal modulation audible?
If you mean "can a listener identify NOMN as a recognizable effect," generally no, and that's the design intent. A flanger that wasn't audible would be failing at its purpose. NOMN that was audible as processing would be failing at its purpose. They're aiming at opposite outcomes.
If you mean "would a listener succeed at A/B-distinguishing NOMN-processed audio from unprocessed audio in a controlled trial," that's an empirical question we'd love to investigate with proper perceptual research, and when we can fund that study and publish the results, we will. It's also not the question that decides whether the technology matters or is worth pursuing or supporting.
The relevant question is the one the audio industry has been answering for decades on every other dimension of the playback chain: does the technology operate at the temporal resolution the sensory system actually uses? For sample rate, bit depth, latency, jitter, and frequency response, the industry has consistently answered yes. The production chain should match the sensory floor, not the conscious A/B detection threshold. We're applying the same engineering discipline to temporal microstructure. Whether a listener can articulate the difference in a forced-choice test on a per-track basis is a different question from whether the technology serving billions of hours of human listening should match the sensory resolution.
Why is it called NOMN? Is the last N a silent N?
Where can I read more about the auditory science you're citing?
INTERAURAL TIME DIFFERENCE THRESHOLDS
— Klumpp, R.G. & Eady, H.R. (1956). "Some Measurements of Interaural Time Difference Thresholds." Journal of the Acoustical Society of America 28(5): 859-860. The original measurement: 9μs threshold for band-limited noise, 11μs for 1000-Hz tone, 28μs for clicks (75% correct discrimination, ten listeners).
— Mills, A.W. (1958). "On the Minimum Audible Angle." Journal of the Acoustical Society of America 30(4): 237-246. Foundational measurement of angular acuity in sound localization (~1° near midline).
— Brughera, A., Dunai, L. & Hartmann, W.M. (2013). "Human interaural time difference thresholds for sine tones: The high-frequency limit." Journal of the Acoustical Society of America 133(5): 2839-2855. Modern confirmation of ~10μs thresholds for pure tones at mid-frequencies, with high-frequency cutoff around 1.4 kHz.
NEURAL CODING OF TEMPORAL STRUCTURE
— Joris, P.X., Schreiner, C.E. & Rees, A. (2004). "Neural Processing of Amplitude-Modulated Sounds." Physiological Reviews 84(2): 541-577. The standard review on how the auditory system encodes temporal modulation for source localization, identification, and parsing.
— Moore, B.C.J. (2008). "The role of temporal fine structure processing in pitch perception, masking, and speech perception for normal-hearing and hearing-impaired people." Journal of the Association for Research in Otolaryngology 9(4): 399-406. The canonical review of temporal fine structure (TFS) and its perceptual role.
— Smith, Z.M., Delgutte, B. & Oxenham, A.J. (2002). "Chimaeric sounds reveal dichotomies in auditory perception." Nature 416: 87-90. The foundational experimental demonstration that listeners rely on TFS for pitch and localization while ENV dominates speech recognition in quiet.
— Lorenzi, C., Gilbert, G., Carn, H., Garnier, S. & Moore, B.C.J. (2006). "Speech perception problems of the hearing impaired reflect inability to use temporal fine structure." Proceedings of the National Academy of Sciences 103: 18866-18869. Direct evidence for TFS's role in speech-in-noise perception.
SOURCE/ENVIRONMENT SEPARATION
— Wallach, H., Newman, E.B. & Rosenzweig, M.R. (1949). "The Precedence Effect in Sound Localization." American Journal of Psychology 62(3): 315-336. The foundational paper showing that listeners localize sounds based on first-arriving wavefront, suppressing reverberant reflections.
— Bregman, A.S. (1990). Auditory Scene Analysis: The Perceptual Organization of Sound. MIT Press. The standard reference text on how the auditory system organizes complex sound mixtures into source representations.
— Litovsky, R.Y., Colburn, H.S., Yost, W.A. & Guzman, S.J. (1999). "The Precedence Effect." Journal of the Acoustical Society of America 106(4): 1633-1654. Comprehensive review of the precedence effect and echo suppression literature.
LATENCY PERCEPTION AND MUSICAL PERFORMANCE
— Jack, R.H., Mehrabi, A., Stockman, T. & McPherson, A. (2018). "Action-sound Latency and the Perceived Quality of Digital Musical Instruments." Music Perception 36(1): 109-128. Professional percussionists rated 10ms±3ms jitter and 20ms latency conditions as significantly lower quality than zero latency.
— McPherson, A., Jack, R. & Moro, G. (2016). "Action-Sound Latency: Are Our Tools Fast Enough?" Proc. NIME 2016. Survey demonstrating most digital musical instrument platforms fail to meet sub-millisecond latency targets; motivates the Bela platform.
— Schmid, A., et al. (2024). "Measuring the Just Noticeable Difference for Audio Latency." Proc. Mensch und Computer 2024 (ACM). Mean JND of 27ms at 64ms base latency, with musically sophisticated listeners detecting smaller margins.
— Dahl, S. (2011). "Striking Movements: A Survey of Motion Analysis of Percussionists." Music Perception 28(5): 491-503. Documentation of percussionist timing variability.
NATURAL TIMING STATISTICS
— Hennig, H. (2014). "Synchronization in human musical rhythms and mutually interacting complex systems." Proceedings of the National Academy of Sciences 111(36): 12974-12979. Direct measurement of 1/f long-range correlations in professional drum performance timing.
— Gilden, D.L., Thornton, T. & Mallon, M.W. (1995). "1/f noise in human cognition." Science 267: 1837-1839. Broader finding of 1/f temporal structure across human cognitive and motor performance.
We cite this work because we want NOMN's perceptual claims to rest on the same foundation as the rest of the auditory science community's. Independent measurement and verification are how this field moves forward, and we don't want to be exempt from that.
Der mit Abstand schnellste menschliche Sinn ist das Hören — um mehr als den Faktor 10. Menschen können Zeitunterschiede von zehn Mikrosekunden erkennen. Wenn der Monitor, auf dem Sie dies lesen, mit 60 Hz aktualisiert wird, ist das 1500-mal langsamer als Ihre Ohren auflösen können.
Jede digitale Audioquelle der Welt teilt eine Eigenschaft: nahezu mathematisch perfektes Timing. DAWs, digitale Synthesizer, Drum Machines, Sampler, Streaming-Audio — all das ist vom Design her zeitlich starr. Audiophile streben mit externen 10-MHz-Clocks maximale Stabilität an. Die Definition von „Wiedergabetreue" war: null Frequenzinstabilität. Null Timing-Variation.
Parallel dazu hat die Branche fünfzig Jahre damit verbracht, die spektrale Wiedergabetreue zu optimieren und eine digitale Infrastruktur für Musikproduktion und -wiedergabe aufzubauen, die Größenordnungen unterhalb der zeitlichen Empfindlichkeit des Systems operiert, dem sie dienen sollte: dem Zuhörer.
Klang in der Natur ist niemals zeitlich perfekt. Jedes akustische Instrument, jede Stimme, jeder Windhauch durch eine Umgebung weist kontinuierliche Timing-Variationen im Mikrosekundenbereich auf, die aus der Physik seiner Erzeugung resultieren. Diese Variationen sind keine Unvollkommenheiten — sie sind das, was das Hörsystem als Lebendigkeit erkennt. Die entscheidende Sub-Technologie, die den Grundstein aller Audiotechnologien bildet, ist eine zugrunde liegende Periodizität — ein Takt. Ob es sich um eine modulierte elektrische Frequenz handelt, einen rotierenden Wachszylinder, eine Schallplattenrille oder einen Digital-Analog-Wandler: Es gibt immer eine Methode, die logische Struktur der neu erzeugten Quanten im gesamten System aufrechtzuerhalten. Wenn dieser Takt degradiert, bricht die Illusion zusammen — wie ein zu langsam geblättertes Daumenkino: der perzeptuelle Hack scheitert.
Plattenspieler und analoge Bandmaschinen klingen nicht besser — sie fühlen sich besser an. Sie sind Microtiming-Enhancer. Die mechanischen Instabilitäten eines Plattentellers oder Bandtransports führen Variationen in der Zeitdomäne ein, gekoppelt mit Frequenzinstabilität. Das ist eine Qualität, für die Menschen enorme Summen ausgeben — durch Vinyl-Pressungen, Röhrenverstärker und analoge Signalketten — oft ohne benennen zu können, was sie hören. Denn was sie hören, ist nicht spektral. Es ist temporal.
NOMN gibt digitalem Audio sein zeitliches Leben zurück. Es ist ein Microtiming-Enhancement-System, das menschlich strukturierte, nicht-wiederholende Timing-Variationen in jeden Audiostrom einführt — mit der Auflösung des menschlichen Wahrnehmungssystems.
--
## Funktionsweise
NOMN ist auf die zeitliche Mikrostruktur menschlicher Sprache in über 80 Sprachen trainiert. Nicht Phoneme, nicht Wörter, nicht Bedeutung, nicht Stimmqualität. Ausschließlich die mikroskopischen Timing-Muster, die biologische Kommunikation lebendig wirken lassen. Muster aus vielfältigen sprachlichen Traditionen werden zu einem generativen Modell organischen zeitlichen Verhaltens destilliert.
Zur Laufzeit erzeugt das System einen kontinuierlichen Strom von Timing-Variationen — über 1.000 Aktualisierungen pro Sekunde — und wendet diese auf eingehendes Audio an. Der ursprüngliche Inhalt bleibt vollständig erhalten. Dem Signal wird nichts hinzugefügt oder entfernt. Nur die zeitliche Mikrostruktur wird angereichert — in Auflösungen unterhalb der Schwelle von Swing oder Groove, aber innerhalb der Schwelle wahrnehmbarer Wirkung.
Die Variationen sind nicht zufällig und lassen sich nicht mit Jitter replizieren. Sie sind nicht periodisch. Sie wiederholen sich nicht. Sie sind kontextuell strukturiert und nicht-wiederholend — live generiert für jeden Moment des durchlaufenden Audios.
--
## Die API
Als erste Veröffentlichung ist NOMN als Cloud-Verarbeitungsdienst verfügbar. Audio einsenden, zeitlich verbessertes Audio zurückerhalten.
Die API akzeptiert Audio in Standardformaten und liefert verarbeitete Ausgabe. Steuerungsparameter sind optional — wenn angegeben, ermöglichen sie die Navigation durch den internen Raum der Timing-Verhaltensweisen des Systems. Ohne Angabe bestimmt das System automatisch die optimale Verbesserung für das Eingangsmaterial und passt sich in Echtzeit an, um die wahrnehmbare Wirkung zu maximieren und gleichzeitig volle Transparenz zu wahren.
Die Verarbeitung läuft mit hohen Abtastraten bei sub-Millisekunden-Zeitauflösung. Die Latenz hängt von der Konfiguration ab und eignet sich für Mastering, Postproduktion und Stapelverarbeitungs-Workflows. Nahezu-Echtzeit-Konfigurationen sind für Streaming-Anwendungen verfügbar.
--
## Anwendungsfälle
Mastering & Postproduktion
Eine neue Dimension der Audioverbesserung, orthogonal zu EQ, Kompression, räumlicher Verarbeitung und Lautheit. Anwendbar auf jedes Master, jedes Genre, jede Ära der Aufnahmetechnik.
Streaming & Wiedergabe
Einsetzbar als Echtzeit-Verarbeitungsschicht in Streaming-Infrastruktur oder Wiedergabegeräten. Verbessert jedes durchlaufende Audio — Musik, Podcasts, Filmaudio — ohne Inhaltsänderung.
Hardware-Integration
Der Rechenbedarf des Systems ist klein genug für den Einsatz auf Audio-DSP-Chips — klein genug für Ohrhörer, Automotive-Headunits und tragbare Player. Lizenzierbar für die Integration in Consumer-Audio-Hardware, Automotive-Audiosysteme und professionelle Ausrüstung.
--
## Was es nicht ist
NOMN ist kein Equalizer, kein Kompressor, kein Raumprozessor und kein Effekt. Es verändert weder Frequenzinhalt noch Dynamikumfang, weder Stereobild noch Lautheit. Es fügt keine Obertöne, kein Rauschen und keine Sättigung hinzu.
Es operiert in einer Dimension des Audios, die kein existierendes Werkzeug adressiert: die zeitliche Mikrostruktur, die es Audio überhaupt erst ermöglicht, als perzeptueller Hack zu funktionieren.
--
## Technische Hinweise
NOMNs Timing-Variationen operieren im Mikrosekundenbereich — in der gleichen Größenordnung wie die Timing-Instabilitäten analoger Wiedergabesysteme, aber strukturiert statt mechanisch und nicht-wiederholend statt periodisch.
Das System enthält eine kontinuierliche Qualitätsvalidierung, die das Verhältnis zwischen beabsichtigtem und gerendertem Timing überwacht und sicherstellt, dass die Verbesserung die gesamte Signalkette von der Verarbeitung bis zur Ausgabe übersteht. Nulltest-Analyse bestätigt, dass die Verbesserung spektral transparent ist — der einzige messbare Unterschied zwischen Ein- und Ausgabe liegt in der Zeitdomäne.
--
## Formate & Zugang
API: RESTful HTTP-Endpunkt. Audio senden, verarbeitetes Audio empfangen. Optionale Steuerungsparameter. Automatik-Modus verfügbar.
Lizenzierung: Verfügbar für die Integration in Hardware, Software und Streaming-Infrastruktur. Lizenzmodelle pro Gerät, pro Track oder als Enterprise-Lizenz.
Patentstatus: Patent angemeldet (Japan, 2026). POLYTOPE KK.
--
## Zur Subtilität
Der Effekt ist bewusst subtil. Es ist keine diskrete Veränderung, die man hört wie einen EQ — es ist eine qualitative Verschiebung in der Art, wie sich Audio als zeitliche Erfahrung anfühlt. Audio hat schon immer durch die Ausnutzung der zeitlichen Auflösung des Ohrs funktioniert: ein Takt, der schnell genug ist, um die Wahrnehmungsdiskriminierung zu überschreiten, erzeugt die Illusion von Kontinuität. NOMN operiert an dieser selben Schwelle — nicht indem es den Takt degradiert, sondern indem es ihm die Art von strukturierter Instabilität verleiht, die akustische und mechanische Systeme schon immer hatten und die digitale Systeme eliminiert haben.
Ob dies für einen bestimmten Hörer, eine bestimmte Aufnahme, eine bestimmte Wiedergabekette von Bedeutung ist, ist eine empirische Frage, keine rhetorische. Wir machen keine Behauptungen darüber, was Sie fühlen werden — aber wir fühlen es, und wir hoffen, dass Sie es auch tun werden.