NOMN: マイクロタイミングエンハンサー
What is NOMN actually doing to audio?
Digital playback runs on a crystal-locked clock whose timing stability is orders of magnitude tighter than any natural acoustic source. Crystals have measurable phase noise and jitter. We're not claiming they don't, but those deviations are vanishingly small and statistically structureless compared to the rich temporal variation any physical sound source produces. There has never, in the natural history of hearing, been a sound source so temporally rigid.
NOMN puts back the kind of variation that grid-locked playback removed. Not as random noise, not as a recognizable effect, but as structured temporal patterning that the auditory system reads as natural rather than mechanical.
Isn't this just an advanced tremolo or a fancy chorus?
What's new is what's driving it and what it does to the human:machine relation for audio.
A tremolo's control signal is a 2-parameter LFO. A chorus is a 4-6 parameter LFO. A humanizer plugin is filtered random noise. Tape emulation is noise shaped to match measured wow/flutter spectra form vintage gear. All of these are content-blind and aren't modeled from the body, they're modeled from affective technologic nostalgia.
Music cognition research says the smallest perceptible timing difference is something like 10-50ms. Doesn't that mean NOMN's microsecond-scale modulation is below audibility / "Just Noticeable Difference" (JND) threshold and thus handwavy audiophile nonsense like wildly expensive speaker cable or something?
First, on what the JND literature actually measures. JND (just-noticeable-difference) thresholds for musical timing, the ones in the 10-50ms range, measure how much one note has to move relative to another before a listener can consciously identify the shift in a forced-choice cognitive task. That tells you when timing becomes *labelable* as different. It does not tell you the resolution at which the auditory system processes time or what we sense.
The auditory system's actual temporal resolution is roughly three to four orders of magnitude finer than musical JND. The two most established lines of evidence:
The binaural pathway resolves interaural time differences down to about 10 microseconds. Klumpp & Eady (1956, J. Acoust. Soc. Am. 28: 859-860) measured average ITD discrimination thresholds of 9μs for band-limited noise and 11μs for a 1000-Hz tone across ten listeners. These thresholds have been independently reproduced for nearly seventy years. Brughera, Dunai & Hartmann (2013, J. Acoust. Soc. Am. 133: 2839-2855) confirmed thresholds just above 10μs at 700-1000 Hz using modern methods. The lowest measured thresholds approach the single-microsecond range under optimal conditions. The mechanism is well-understood: neurons in the medial superior olive perform coincidence detection on phase-locked spikes from each ear. The largest ITD anyone normally encounters, for a sound directly to one side, is around 600-700μs, set by the distance between the ears (Mills 1958, J. Acoust. Soc. Am. 30: 237-246). Listeners reliably resolve angular differences of about 1 degree near the midline. Note that most of this research is already 70+ years old!
The monaural pathway encodes the sub-millisecond structure of sounds through what auditory neuroscience calls **temporal fine structure (TFS)**, the rapid waveform oscillations within each cochlear frequency band, as distinct from the slower envelope (ENV) modulations superimposed on them (Moore 2008, J. Assoc. Res. Otolaryngol. 9: 399-406, the canonical review). TFS information is carried in the timing of auditory-nerve-fiber spikes that phase-lock to individual cycles of the stimulus waveform for low-frequency components up to several kilohertz. This isn't a hypothesis or a contested claim, it is the standard model of how the auditory periphery encodes time, reviewed comprehensively in Joris, Schreiner & Rees (2004, Physiological Reviews 84: 541-577).
TFS is what the auditory system uses for pitch perception of complex tones, for the perception of speech in fluctuating background noise, and for source separation in complex acoustic environments. Smith, Delgutte & Oxenham (2002, Nature 416: 87-90) demonstrated this directly by constructing "chimaeric" sounds in which the envelope of one signal was combined with the TFS of another. Listeners reliably perceived pitch and source location based on the TFS, not the envelope. TFS isn't specific to live sound, binaural listening, or any particular playback situation. It operates on whatever the cochlea receives, including the output of headphones and speakers playing recorded music. When you listen to a recording, the temporal fine structure of the audio is encoded into the spike timing of your auditory nerve at sub-millisecond resolution. This processing happens continuously, below the threshold of conscious awareness, which is exactly why musical JND studies don't measure it. JND measures what listeners can report. It doesn't measure what their auditory systems are doing.
The more important point. **The right question isn't whether listeners can A/B-distinguish two audio files in a controlled trial. The right question is whether the technology that generates audio for human consumption should operate at the resolution of the sensory system it's serving.**
The audio industry has answered this question consistently for decades. Studios record at 96kHz or 192kHz not because listeners can reliably A/B-distinguish those rates from 48kHz on every track, but because the production chain shouldn't have artifacts introduced at the resolution end of the system. Mastering engineers obsess over jitter specifications in word clocks that operate well below classical audibility thresholds, because they don't want the clock to be the bottleneck. Professional audio interfaces compete on sub-millisecond round-trip latency. The principle is consistent: human-facing audio technology should operate above the sensory floor, not below it.
NOMN sits in this lineage. Crystal-locked playback timing is acoustically unprecedented in the natural history of hearing. There has never been a sound source with this little temporal variation. The question isn't whether listeners can articulate the difference in a forced-choice test on a per-track basis. The question is whether AI-generated audio at scale, intended for billions of hours of human listening, should match the temporal resolution the sensory system actually uses. We think it should. The audio industry has historically agreed with that principle for every other dimension of the playback chain: sample rate, bit depth, jitter, latency, frequency response, distortion. Treating the temporal microstructure dimension as the lone exception, just because the relevant variation sits below conscious labelling threshold, is inconsistent.
If the audibility critique held, if anything below conscious JND were perceptually irrelevant, listeners couldn't localize sound sources, couldn't separate voices in a crowd, couldn't tell a real violin from a sampled violin played through the same speaker. All of those judgments depend on temporal resolution far finer than musical JND.
OK so this is all pretty interesting, but what's temporal fine structure, exactly, and where does NOMN sit relative to the established TFS literature?
The TFS framework has been extensively developed in the auditory science literature over the past two decades. Moore (2008, J. Assoc. Res. Otolaryngol. 9: 399-406) is the standard review of TFS's role in pitch perception, masking, and speech perception. Smith, Delgutte & Oxenham (2002, Nature 416: 87-90) used "chimaeric" sounds, constructed by combining the envelope of one signal with the TFS of another, to demonstrate that listeners rely on TFS for pitch and source localization while relying on ENV for speech recognition in quiet. Subsequent work (Lorenzi et al. 2006, PNAS 103: 18866-18869; Hopkins & Moore 2009, J. Acoust. Soc. Am. 125: 442-446) has shown that TFS sensitivity is critical for speech perception in noisy environments, and that hearing-impaired listeners' reduced sensitivity to TFS is a major factor in their difficulty understanding speech in noise.
This matters for NOMN in two ways.
First, TFS is the established technical vocabulary for what NOMN operates on. The temporal microstructure NOMN restores to digital playback is, in the technical language of the field, modulation in the temporal fine structure of the audio signal. We aren't making up a new perceptual category. We're operating in a well-mapped region of the auditory science literature.
Second, the existing TFS research focuses primarily on what's *lost*. How hearing-impaired listeners lose TFS sensitivity, how cochlear implants struggle to deliver TFS information, how aging degrades TFS processing. NOMN approaches the question from the other direction: what kind of TFS structure should well-engineered playback technology preserve and present to listeners whose TFS processing is intact? The auditory science community has spent two decades documenting how much TFS matters for normal hearing. The audio industry has not yet drawn the corresponding conclusion about playback technology design. NOMN is one application of that conclusion.
A note on scope. The "fine structure" in TFS refers to the rapid carrier oscillation within auditory filter bands, which is encoded at sub-millisecond resolution via phase-locking up to several kilohertz. NOMN's modulation operates across a range from microsecond to millisecond scales, modulating the temporal structure of the audio content itself. Both sit in the temporal regime where the auditory system does fine-grained timing work. We use the broader phrase "temporal microstructure" in marketing copy to avoid claiming we directly manipulate the specific signal-processing quantity that TFS researchers technically measure with the Hilbert decomposition, but the perceptual mechanism we're targeting is the same one that TFS research has been documenting since the early 2000s.
If sub-JND timing differences don't matter, why does the audio industry spend so much effort minimizing latency?
Every working musician who records with a DAW tunes their audio buffer size to keep round-trip latency as low as possible. Professional audio interfaces compete on sub-millisecond round-trip latency. The Bela platform was specifically built to achieve sub-millisecond action-to-sound latency for digital musical instruments (McPherson, Jack & Moro 2016, Proc. NIME) because most common platforms fail to meet the targets professional musicians need.
The peer-reviewed evidence on what musicians actually feel is clear. Jack, Mehrabi, Stockman & McPherson (2018, Music Perception 36: 109-128) tested professional percussionists and amateur musicians on a digital percussion instrument with controlled latency conditions of 0ms, 10ms, 10ms ± 3ms jitter, and 20ms. Both groups rated zero-latency as significantly higher quality than the 10ms-with-jitter and 20ms conditions. Professional percussionists were more sensitive to latency than amateurs and showed measurable changes in timing performance under added latency. Schmid et al. (2024, Proc. Mensch und Computer, ACM) measured just-noticeable-difference for added audio latency across 37 listeners and found a mean JND of 27ms at 64ms base latency, with musically sophisticated participants reliably detecting smaller margins. Earlier ensemble work documented that asynchronies up to 50ms occur in real performances (Rasch 1979, Acustica 43: 121-131) and that professional percussionists exhibit timing jitter of 10-40ms even when synchronizing to a metronome (Dahl 2011, Music Perception 28: 491-503).
Acoustic drums have a natural latency of about 2-3ms from stick contact to sound reaching the drummer's ears, a value set by the speed of sound across the distance from the drum to the head. This is the baseline the drummer's nervous system has calibrated to over years of practice. When an electronic drum module introduces an extra 5-10ms on top of this, professional drummers describe the kit as "sluggish," "disconnected," "laggy."
Notice what's happening here. The audio industry has, for decades, accepted the principle that **playback technology should operate at the temporal resolution the sensory system actually uses, not at the resolution of conscious A/B detection**. Nobody argues that audio interfaces should target 50ms latency because that's the conscious JND. The industry targets sub-millisecond because that's where the human:machine interaction breaks down. Studios record at high sample rates so that the production chain isn't the bottleneck. Word clocks are spec'd at jitter levels below classical audibility for the same reason. You don't want the clock to be the lowest-resolution element in the system.
This is exactly the principle NOMN applies. Crystal-locked playback has temporal stability orders of magnitude tighter than any natural acoustic source. The sensory system that consumes the audio resolves timing at microsecond scales. The fact that listeners can't always consciously label what they're hearing in an A/B test doesn't mean the technology should operate below the sensory floor. It means the audio industry should treat temporal microstructure with the same engineering discipline it already applies to sample rate, bit depth, latency, and jitter.
But the speaker cone and the room introduce way more temporal modification than NOMN does. Doesn't that swamp the effect?
The relevant difference isn't magnitude. It's structure.
Room and speaker convolution is content-blind and stationary. The room's impulse response is fixed for a given listening position. The reverb tail of a snare hit and the reverb tail of a sustained vocal note get the same room treatment. This is convolution with a fixed kernel, large in magnitude, but content-blind and time-invariant.
The auditory system has well-documented machinery for separating direct-path source signals from reverberant reflections. The foundational finding is the precedence effect, first systematically described by Wallach, Newman & Rosenzweig (1949, American Journal of Psychology 62: 315-336). When two identical sounds arrive at the ears within a few milliseconds of each other, the listener perceives a single fused sound localized at the position of the first-arriving wavefront, with the later-arriving reflections strongly suppressed in their contribution to perceived location. This is why you can localize a speaker in a reverberant room. The brain attributes the spatial cue to the direct sound and treats the reflections as environment. The mechanism extends into the broader framework of auditory scene analysis (Bregman, 1990, MIT Press), in which the auditory system uses primitive grouping cues to organize incoming sound into source representations distinct from environmental context. Subsequent reviews (Litovsky et al. 1999, J. Acoust. Soc. Am. 106: 1633-1654; Brown et al. 2015, J. Acoust. Soc. Am. 137: 776-790) document this is a continuous, automatic process operating below conscious awareness.
What the auditory system *can't* factor out, and uses heavily for source identification and naturalness judgment, is the underlying source's intrinsic timing structure. The room can smear what's there. It can't add what isn't, and it can't subtract what is.
Put simply: a real violin and a sampled violin played through the same speaker in the same room are typically distinguished by listeners on extended listening. The acoustic chain is identical. The difference is in source-level temporal structure that survives the chain because it's encoded in the signal before it ever reaches the speaker.
Doesn't the DAC's reconstruction filter smooth out fast timing modulation anyway?
A general principle worth stating clearly: NOMN's modulation is content, not metadata. Anything that processes the audio processes the modulation along with it. Anything that doesn't process the audio can't touch the modulation. There's no separate channel to attack. The same logic applies to the speaker, the room, the listener's HRTF, the ear canal. All linear time-invariant operations applied to the audio content, none of which selectively erase the modulation.
Couldn't you accomplish the same thing with a low-depth chorus or some filtered noise driving varispeed?
The difference is in what the auditory system does with different kinds of variation. LFO-driven modulation is periodic, and the auditory system detects periodicity below conscious awareness. Subtle periodic modulation reads as "wobbly" or "effected" even when listeners can't say why. Filtered noise modulation is aperiodic but content-blind, which the auditory system also reads as foreign to natural sources, since natural sources don't produce statistically white timing variation. Natural timing variation has specific structure: long-range correlations and content correlation that have been measured directly in human performance. Hennig (2014, PNAS 111: 12974-12979) documented that timing deviations in professional drum performances exhibit long-range (1/f-type) correlations rather than white-noise statistics, a finding consistent with broader work on temporal structure in human motor performance (Gilden, Thornton & Mallon 1995, Science 267: 1837-1839). The closer your modulation matches this structure, the less the auditory system flags it as alien.
NOMN's modulation matches that structure. A low-depth chorus or 1/f noise doesn't.
Hasn't this been tried before? Isn't NOMN just like MQA or C Wave or one of those audiophile dead ends?
C Wave argues that PCM is "non-continuous" and that the brain detects this discontinuity. Their solution is kinda reverb to "fill in gaps." We don't share that diagnosis. A reverb algorithm running on PCM is still PCM, and Shannon-Nyquist guarantees that properly bandlimited PCM is mathematically equivalent to a continuous waveform up to the Nyquist frequency. There are no gaps to fill in the digital signal. We're not claiming to fix something inside PCM. We're claiming that natural acoustic sources have temporal microstructure that crystal-locked playback lacks, which is a different claim, one grounded in the physical properties of natural sound sources rather than in disputed claims about sampling theory.
The single biggest lesson from those efforts: don't pick fights with sampling theory, don't claim what you can't measure, and don't treat independent measurement as an enemy.
How is this different from a humanizer plugin?
Two differences. First, humanizers add stochastic variation. NOMN adds structured variation matched to natural source statistics. Random isn't the same as natural. The long-range correlation structure documented in human motor timing (Gilden et al. 1995; Hennig 2014) is categorically different from the white-noise distribution most humanizers produce, and the auditory system responds to that distinction.
Second, humanizers operate on MIDI event timing before audio rendering. NOMN operates on audio at the signal level. A humanizer on a quantized MIDI snare moves the hit. NOMN modulates the playback of the audio itself. Different operations, different signal-chain positions, different effects. A humanizer can't humanize a finished audio file. NOMN can.
Is the temporal modulation audible?
If you mean "can a listener identify NOMN as a recognizable effect," generally no, and that's the design intent. A flanger that wasn't audible would be failing at its purpose. NOMN that was audible as processing would be failing at its purpose. They're aiming at opposite outcomes.
If you mean "would a listener succeed at A/B-distinguishing NOMN-processed audio from unprocessed audio in a controlled trial," that's an empirical question we'd love to investigate with proper perceptual research, and when we can fund that study and publish the results, we will. It's also not the question that decides whether the technology matters or is worth pursuing or supporting.
The relevant question is the one the audio industry has been answering for decades on every other dimension of the playback chain: does the technology operate at the temporal resolution the sensory system actually uses? For sample rate, bit depth, latency, jitter, and frequency response, the industry has consistently answered yes. The production chain should match the sensory floor, not the conscious A/B detection threshold. We're applying the same engineering discipline to temporal microstructure. Whether a listener can articulate the difference in a forced-choice test on a per-track basis is a different question from whether the technology serving billions of hours of human listening should match the sensory resolution.
Why is it called NOMN? Is the last N a silent N?
Where can I read more about the auditory science you're citing?
INTERAURAL TIME DIFFERENCE THRESHOLDS
— Klumpp, R.G. & Eady, H.R. (1956). "Some Measurements of Interaural Time Difference Thresholds." Journal of the Acoustical Society of America 28(5): 859-860. The original measurement: 9μs threshold for band-limited noise, 11μs for 1000-Hz tone, 28μs for clicks (75% correct discrimination, ten listeners).
— Mills, A.W. (1958). "On the Minimum Audible Angle." Journal of the Acoustical Society of America 30(4): 237-246. Foundational measurement of angular acuity in sound localization (~1° near midline).
— Brughera, A., Dunai, L. & Hartmann, W.M. (2013). "Human interaural time difference thresholds for sine tones: The high-frequency limit." Journal of the Acoustical Society of America 133(5): 2839-2855. Modern confirmation of ~10μs thresholds for pure tones at mid-frequencies, with high-frequency cutoff around 1.4 kHz.
NEURAL CODING OF TEMPORAL STRUCTURE
— Joris, P.X., Schreiner, C.E. & Rees, A. (2004). "Neural Processing of Amplitude-Modulated Sounds." Physiological Reviews 84(2): 541-577. The standard review on how the auditory system encodes temporal modulation for source localization, identification, and parsing.
— Moore, B.C.J. (2008). "The role of temporal fine structure processing in pitch perception, masking, and speech perception for normal-hearing and hearing-impaired people." Journal of the Association for Research in Otolaryngology 9(4): 399-406. The canonical review of temporal fine structure (TFS) and its perceptual role.
— Smith, Z.M., Delgutte, B. & Oxenham, A.J. (2002). "Chimaeric sounds reveal dichotomies in auditory perception." Nature 416: 87-90. The foundational experimental demonstration that listeners rely on TFS for pitch and localization while ENV dominates speech recognition in quiet.
— Lorenzi, C., Gilbert, G., Carn, H., Garnier, S. & Moore, B.C.J. (2006). "Speech perception problems of the hearing impaired reflect inability to use temporal fine structure." Proceedings of the National Academy of Sciences 103: 18866-18869. Direct evidence for TFS's role in speech-in-noise perception.
SOURCE/ENVIRONMENT SEPARATION
— Wallach, H., Newman, E.B. & Rosenzweig, M.R. (1949). "The Precedence Effect in Sound Localization." American Journal of Psychology 62(3): 315-336. The foundational paper showing that listeners localize sounds based on first-arriving wavefront, suppressing reverberant reflections.
— Bregman, A.S. (1990). Auditory Scene Analysis: The Perceptual Organization of Sound. MIT Press. The standard reference text on how the auditory system organizes complex sound mixtures into source representations.
— Litovsky, R.Y., Colburn, H.S., Yost, W.A. & Guzman, S.J. (1999). "The Precedence Effect." Journal of the Acoustical Society of America 106(4): 1633-1654. Comprehensive review of the precedence effect and echo suppression literature.
LATENCY PERCEPTION AND MUSICAL PERFORMANCE
— Jack, R.H., Mehrabi, A., Stockman, T. & McPherson, A. (2018). "Action-sound Latency and the Perceived Quality of Digital Musical Instruments." Music Perception 36(1): 109-128. Professional percussionists rated 10ms±3ms jitter and 20ms latency conditions as significantly lower quality than zero latency.
— McPherson, A., Jack, R. & Moro, G. (2016). "Action-Sound Latency: Are Our Tools Fast Enough?" Proc. NIME 2016. Survey demonstrating most digital musical instrument platforms fail to meet sub-millisecond latency targets; motivates the Bela platform.
— Schmid, A., et al. (2024). "Measuring the Just Noticeable Difference for Audio Latency." Proc. Mensch und Computer 2024 (ACM). Mean JND of 27ms at 64ms base latency, with musically sophisticated listeners detecting smaller margins.
— Dahl, S. (2011). "Striking Movements: A Survey of Motion Analysis of Percussionists." Music Perception 28(5): 491-503. Documentation of percussionist timing variability.
NATURAL TIMING STATISTICS
— Hennig, H. (2014). "Synchronization in human musical rhythms and mutually interacting complex systems." Proceedings of the National Academy of Sciences 111(36): 12974-12979. Direct measurement of 1/f long-range correlations in professional drum performance timing.
— Gilden, D.L., Thornton, T. & Mallon, M.W. (1995). "1/f noise in human cognition." Science 267: 1837-1839. Broader finding of 1/f temporal structure across human cognitive and motor performance.
We cite this work because we want NOMN's perceptual claims to rest on the same foundation as the rest of the auditory science community's. Independent measurement and verification are how this field moves forward, and we don't want to be exempt from that.
人間の最速の感覚は聴覚であり、その差は10倍以上にもなる。人間は10マイクロ秒の時間差を検出できる。今お読みのモニターが60hzでリフレッシュしているとすれば、それは耳の分解能より1500倍遅い。
地球上のあらゆるデジタルオーディオソースには一つの共通した特性がある:数学的にほぼ完全なタイミングだ。DAW、デジタルシンセサイザー、ドラムマシン、サンプラー、ストリーミングオーディオ——すべてが設計上、時間的に厳密である。オーディオファイルは10MHZ外部クロックを使って最大の安定性を追求する。「忠実度」の定義は、周波数の不安定性ゼロ、タイミング変動ゼロであった。
それと並行して、業界は50年かけてスペクトルの忠実度を最適化し、音楽の制作とリスニングのためのデジタルインフラを構築してきた——それが奉仕すべきシステム、すなわちリスナーの時間的感度より桁違いに低い精度で動作するインフラを。
自然界の音は時間的に完全であることは決してない。あらゆるアコースティック楽器、あらゆる声、環境を抜ける風のすべてが、その生成の物理に起因する連続的なマイクロ秒スケールのタイミング変動を示す。これらの変動は不完全さではない——聴覚系が「生きている」と認識するものそのものである。すべてのオーディオ技術の要となるサブテクノロジーは、基礎となる周期性、すなわちクロックである。変調される電気周波数であれ、回転する蝋管であれ、レコードの溝を刻むレースであれ、デジタル-アナログ変換器であれ、新たに作られた量子の論理構造をシステム全体で維持するための方法が常に存在する。そのクロックが劣化すれば、幻想は崩壊する:ゆっくりめくるパラパラ漫画のように、知覚のハックは失敗する。
レコードプレーヤーやアナログテープマシンは音が良いのではない——感触が良いのだ。それらはマイクロタイミングエンハンサーなのである。ターンテーブルやテープトランスポートの機械的不安定性が、時間領域における変動と周波数の不安定性を生み出す。これはレコード盤、真空管、アナログ信号経路に莫大な金額を費やして追い求める品質である——しかも多くの場合、何を聴いているのか名指しできないまま。なぜなら聴いているものはスペクトル的なものではなく、時間的なものだからだ。
NOMNはデジタルオーディオに時間的な生命を取り戻す。人間の知覚系の分解能で動作し、あらゆるオーディオストリームに人間的に構造化された、繰り返さないタイミング変動を導入するマイクロタイミング・エンハンスメント・システムである。
--
## 仕組み
NOMNは80以上の言語にわたる人間の発話の時間的微細構造に基づいて学習されている。音素でも、単語でも、意味でも、声質でもない。生物的コミュニケーションを「生きている」と感じさせる微視的なタイミングパターンのみである。多様な言語伝統からのパターンが、有機的な時間的振る舞いの生成モデルへと蒸留される。
実行時、システムは毎秒1,000回以上の更新による連続的なタイミング変動のストリームを生成し、入力オーディオに適用する。元のコンテンツは完全に保持される。信号に何も追加されず、何も除去されない。スウィングやグルーヴのような閾値以下、しかし知覚効果の閾値以内の分解能で、時間的微細構造のみが豊かにされる。
変動はランダムではなく、ジッターで複製することはできない。周期的でもない。ループしない。コンテキストに構造化され、繰り返さない——通過するオーディオの各瞬間に対してライブで生成される。
--
## API
最初のリリースとして、NOMNはクラウド処理サービスとして利用可能である。オーディオを送信し、時間的に強化されたオーディオを受け取る。
APIは標準フォーマットのオーディオを受け付け、処理済み出力を返す。制御パラメータはオプションで、提供された場合はシステムの内部的なタイミング挙動の空間をナビゲートできる。省略された場合、システムは入力素材に最適な強化を自動的に決定し、知覚効果を最大化しながら完全な透明性を維持するようリアルタイムで調整する。
処理は高サンプルレートでサブミリ秒の時間分解能で実行される。レイテンシーは設定に依存し、マスタリング、ポストプロダクション、バッチ処理ワークフローに適している。ストリーミングアプリケーション向けのニアリアルタイム構成も利用可能。
--
## ユースケース
マスタリング&ポストプロダクション
EQ、コンプレッション、空間処理、ラウドネスとは直交する、オーディオ強化の新次元。あらゆるマスター、あらゆるジャンル、あらゆる時代の録音に適用可能。
ストリーミング&再生
ストリーミングインフラや再生デバイスにおけるリアルタイム処理レイヤーとしてデプロイ可能。音楽、ポッドキャスト、映画オーディオなど、通過するあらゆるオーディオをコンテンツ修正なしに強化する。
ハードウェア統合
システムの計算フットプリントはオーディオDSPチップへの組み込みデプロイに十分小さい——イヤフォン、車載ヘッドユニット、ポータブルプレーヤーに搭載可能なサイズ。コンシューマーオーディオハードウェア、車載オーディオシステム、プロフェッショナル機器への統合のためにライセンス可能。
--
## NOMNでないもの
NOMNはイコライザーでも、コンプレッサーでも、空間プロセッサーでも、エフェクトでもない。周波数コンテンツ、ダイナミックレンジ、ステレオイメージ、ラウドネスを変更しない。ハーモニクス、ノイズ、サチュレーションを加えない。
既存のツールが対処していないオーディオの次元——そもそもオーディオが知覚のハックとして機能することを可能にする時間的微細構造——で動作する。
--
## テクニカルノート
NOMNのタイミング変動はマイクロ秒スケールで動作する——アナログ再生システムのタイミング不安定性と同じオーダーだが、機械的ではなく構造化されており、周期的ではなく非反復的である。
システムには、意図されたタイミングとレンダリングされたタイミングの関係を監視する連続的な品質検証が含まれており、処理から出力に至る完全な信号チェーンを通じて強化が維持されることを保証する。ヌルテスト分析により、強化がスペクトル的に透明であることが確認される——入力と出力の間の測定可能な唯一の差異は時間領域にある。
--
## フォーマット&アクセス
API: RESTful HTTPエンドポイント。オーディオを送信し、処理済みオーディオを受信。制御パラメータはオプション。自動モード利用可能。
ライセンス: ハードウェア、ソフトウェア、ストリーミングインフラへの統合に利用可能。デバイス単位、トラック単位、またはエンタープライズライセンスモデル。
特許状況: 特許出願中(日本、2026年)。POLYTOPE KK。
--
## 繊細さについて
エフェクトは設計上、繊細である。EQのように聴こえる離散的な変化ではない——時間的体験としてオーディオがどう感じられるかという質的な変化である。オーディオは常に耳の時間分解能を利用して機能してきた:知覚の弁別を超える速さのクロックが連続性の幻想を生み出す。NOMNはこの同じ閾値で動作する。クロックを劣化させるのではなく、アコースティックおよび機械的システムが常に持っていて、デジタルシステムが排除してきた種類の構造化された不安定性をクロックに与えることで。
これが特定のリスナー、特定の録音、特定の再生チェーンにとって重要かどうかは、修辞的な問いではなく経験的な問いである。あなたが何を感じるかについて主張はしないが、私たちは感じている。あなたにも感じていただけることを願っている。