TL;DR

  • source : https://arxiv.org/pdf/1804.10862
  • code : https://github.com/tebesu/CollaborativeMemoryNetwork/tree/master
  • ์ผ๋ฐ˜์ ์œผ๋กœ Collaborative Filtering๋Š” ํฌ๊ฒŒ ๋‘ ๊ฐ€์ง€
    • Latent factor ๋ชจ๋ธ : MF๊ฐ™์€ ๊ฒƒ์œผ๋กœ global ๊ตฌ์กฐ ์ž˜ ํ•™์Šต
    • Neighborhood ๊ธฐ๋ฐ˜ ๋ชจ๋ธ : Local ๊ตฌ์กฐ ์ž˜ ํ•™์Šต
    • (SVD++๊ฐ™์ด 2๊ฐœ๋ฅผ ํ•ฉ์นœ hybrid๋„ ์žˆ์Œ)
  • Latent factor์™€ Neighborhood ๊ธฐ๋ฐ˜ ๋ชจ๋ธ์„ ๋”ฅ๋Ÿฌ๋‹์œผ๋กœ ๊ฒฐํ•ฉ
    • Memory Network๋ผ๋Š” ๊ตฌ์กฐ๋ฅผ ๋„์ž…ํ•ด์„œ ์œ ์‚ฌํ•œ user์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ์ฒ˜๋ฆฌใ…
  • Contributions
    • External memory์™€ neural attention์„ ์ด์šฉํ•œ Collaborative Memory Network (CMN) ๊ตฌ์กฐ ์ œ์‹œ.
      • attention ๋ฉ”์ปค๋‹ˆ์ฆ˜์ด ์ด์›ƒ ์ •๋ณด์— ๋Œ€ํ•œ nonlinearํ•œ weight๋ฅผ adaptiveํ•˜๊ฒŒ ํ•™์Šต
      • output module์€ ์ด์›ƒ์— ๋Œ€ํ•œ ์ •๋ณด์™€ user, item ์ •๋ณด๋ฅผ nonlinearํ•˜๊ฒŒ ๊ฒฐํ•ฉ
    • CMN๊ณผ ๋Œ€ํ‘œ์ ์ธ 2๊ฐœ์˜ CF ๋ชจ๋ธ (latent factor, neighborhood-based)๊ณผ์˜ ๊ด€๋ จ์„ฑ ๋ฐํ˜€๋ƒ„

Memory Augmented Neural Networks

  • ์ •์˜ : ์ผ๋ฐ˜์ ์ธ ์‹ ๊ฒฝ๋ง์— ์™ธ๋ถ€ ๋ฉ”๋ชจ๋ฆฌ ์ปดํฌ๋„ŒํŠธ ์ถ”๊ฐ€ํ•ด์„œ ๋ชจ๋ธ์˜ ๋Šฅ๋ ฅ ํ–ฅ์ƒ์‹œํ‚จ ๊ตฌ์กฐ
  • ์ฃผ์š” ๊ตฌ์„ฑ ์š”์†Œ
    • ์™ธ๋ถ€ ๋ฉ”๋ชจ๋ฆฌ : ํ–‰๋ ฌ ํ˜•ํƒœ๋กœ ์ง€์‹ ์ €์žฅํ•˜๋Š” ์—ญํ• 
    • ์ปจํŠธ๋กค๋Ÿฌ : ์ผ๋ฐ˜์ ์œผ๋กœ NN์„ ์ด์šฉํ•ด ๋ฉ”๋ชจ๋ฆฌ์— ๋Œ€ํ•œ ์—ฐ์‚ฐ ์ˆ˜ํ–‰
  • Associative Addressing ๋ฉ”๋ชจ๋ฆฌ ์ ‘๊ทผ ๋ฐฉ์‹ : ์ฃผ์–ด์ง„ ์ฟผ๋ฆฌ์™€ ๋ฉ”๋ชจ๋ฆฌ์— ์ €์žฅ๋œ ๋‚ด์šฉ ๊ฐ„์˜ ์œ ์‚ฌ์„ฑ์„ ๊ณ„์‚ฐํ•ด์„œ ๋ฉ”๋ชจ๋ฆฌ ์œ„์น˜ ์ฐพ๋Š” ๋ฐฉ์‹
    • inner product + softmax ๋ฅผ ํ†ตํ•ด ๊ณ„์‚ฐ
    • attention ๋ฉ”์ปค๋‹ˆ์ฆ˜๊ณผ ์œ ์‚ฌํ•ด์„œ ์ค‘์š”ํ•˜๋‹ค๊ณ  ์ƒ๊ฐ๋˜๋Š” ๋ฉ”๋ชจ๋ฆฌ ์œ„์น˜์— ๋” ๋†’์€ ๊ฐ€์ค‘์น˜ ๋ถ€์—ฌ

Collaborative Memory Network (CMN)

์„ธ ๊ฐ€์ง€์˜ ๋ฉ”๋ชจ๋ฆฌ (Input)
  • User specific memory : ๊ฐ ์‚ฌ์šฉ์ž์˜ ๊ณ ์œ ํ•œ ์„ ํ˜ธ๋„ ์ €์žฅ
    • ์ด ๋ช…์˜ User๊ฐ€ ์žˆ์„ ๋•Œ, ๊ฐ User์— ๋Œ€ํ•œ ์ž„๋ฒ ๋”ฉ ํ–‰๋ ฌ
  • Item specific memory : ๊ฐ ์•„์ดํ…œ์˜ ๊ณ ์œ ํ•œ ์†์„ฑ ์ €์žฅ
    • ์ด ๊ฐœ์˜ Item์ด ์žˆ์„ ๋•Œ, ๊ฐ Item ๋Œ€ํ•œ ์ž„๋ฒ ๋”ฉ ํ–‰๋ ฌ
  • collective neighborhood state : ํŠน์ • ์•„์ดํ…œ์— ๋Œ€ํ•ด ํ”ผ๋“œ๋ฐฑ์„ ์ œ๊ณตํ•œ ์‚ฌ์šฉ์ž๋“ค(์ด์›ƒ)์˜ ์ง‘ํ•ฉ์ ์ธ ์„ ํ˜ธ๋„ ์ €์žฅ
    • row : ์‚ฌ์šฉ์ž v๊ฐ€ ๊ณผ๊ฑฐ์— ์†Œ๋น„ํ•œ ์•„์ดํ…œยท์ปจํ…์ŠคํŠธ๋ฅผ ์š”์•ฝํ•œ representation
    • attention ์ˆ˜ํ–‰ํ•  ๋•Œ, value๋กœ ๊ฐ€์ ธ์™€์„œ neighborhood ์ง‘ํ•ฉ ์ •๋ณด ์ „๋‹ฌ
Neighborhood Attention

๊ทธ๋ž˜์„œ ์—ฌ๊ธฐ์„œ๋Š” ์–ด๋–ป๊ฒŒ ์ด์›ƒ์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ํ™œ์šฉํ•˜๋Š”๊ฑฐ์ง€?

  • ํŠน์ • User์™€ Item ์กฐํ•ฉ ์— ๋Œ€ํ•ด์„œ ๋จผ์ € item ์™€ ์ƒํ˜ธ์ž‘์šฉํ•œ user ๋ฆฌ์ŠคํŠธ ํ™•์ธ (๋ณธ์ธ ํฌํ•จ)
  • ํฌ๊ธฐ์˜ user preference vector ์ƒ์„ฑ
    • ๊ฐ ์ฐจ์›์˜ ์˜๋ฏธ : User ์™€ ์ด์›ƒ์ธ ์™€์˜ ๊ด€๊ณ„, ์ƒ๋Œ€์  ์ค‘์š”๋„
  • ์˜ ๊ฐ ์ฐจ์›์— softmax ์”Œ์›Œ์„œ ๋ฅผ weighted sumํ•ด์„œ ์ตœ์ข… neighborhood representation ์ƒ์„ฑ

CMN์€ ์œ ์ €๋“ค ๊ฐ„์˜ ์œ ์‚ฌ์ ์„ ์žก์•„๋‚ด๊ณ  target item์— ๊ธฐ๋ฐ˜ํ•ด์„œ ๊ฐ ์ด์›ƒ๋“ค์˜ ๊ธฐ์—ฌ๋„๋ฅผ ๋™์ ์œผ๋กœ ํ• ๋‹น

  • : โ€˜ํ˜„์žฌ ์˜ ๊ด€์ ์—์„œ ์–ด๋–ค ์ด์›ƒ์ด ์ค‘์š”ํ• ๊นŒโ€™๋ฅผ ์ฐพ๋Š” Key
  • : โ€˜๊ฐ ์ด์›ƒ์ด ์˜ค๋žœ ๊ธฐ๊ฐ„ ์ถ•์ ํ•ด์˜จ ํŠน์„ฑโ€™์„ ๋‹ด์€ Value
  • : ๊ณผ๊ฑฐ ํ–‰๋™์ด ๋†์ถ•๋œ ์žฅ๊ธฐ ํŒจํ„ด ๋ฐ˜์˜
Output Module

๊ทธ๋ž˜์„œ ์ตœ์ข…์ ์œผ๋กœ ์˜ˆ์ธก์„ ์–ด๋–ป๊ฒŒ ํ•˜๋Š”๊ฑฐ์ง€?

  • ์™ผ์ชฝ ๋ถ€๋ถ„์—์„œ๋Š” neighborhood attention ์ง„ํ–‰ํ•ด์„œ ์ตœ์ข…์ ์œผ๋กœ ๋‚˜์˜ด
  • ์˜ค๋ฅธ์ชฝ ๋ถ€๋ถ„์—์„œ๋Š” ๋‚ด์  + MLP
  • ์ตœ์ข…์ ์œผ๋กœ concatํ•˜๊ณ  MLP, ReLU ํƒœ์›Œ์„œ ์˜ˆ์ธก
  • ์žฅ์ 
    • ํŠน์ • user์— ๋Œ€ํ•œ feedback์ด sparseํ•  ๋•Œ, ์ด์›ƒ๋„ ํ™œ์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์œ ๋ฆฌ
    • neural attention ๋ฉ”์ปค๋‹ˆ์ฆ˜์ด ์•Œ์•„์„œ ํŠน์ • item์— ๋Œ€ํ•œ ๊ฐ user์˜ ๊ธฐ์—ฌ๋„๋ฅผ ์กฐ์ •
    • local neighborhood์™€ global latent factor ๊ฐ„์˜ nonlinear ์ƒํ˜ธ์ž‘์šฉ ํ•™์Šต

Neighborhood attention์œผ๋กœ ์–ป์€ ์ด์›ƒ ๊ธฐ๋ฐ˜ ์ •๋ณด + User, Item์˜ ์ž„๋ฒ ๋”ฉ์„ ์ด์šฉํ•œ Latent factor ๊ธฐ๋ฐ˜ ์ •๋ณด nonlinearํ•˜๊ฒŒ ๊ฒฐํ•ฉ

Loss : BPR optimization criterion ์‚ฌ์šฉ1
  • ์‚ฌ์šฉ์ž๊ฐ€ ์‹ค์ œ๋กœ ์ƒํ˜ธ์ž‘์šฉํ•œ ์•„์ดํ…œ์€ ๋ณด์ง€ ์•Š์€ ์•„์ดํ…œ๋ณด๋‹ค ์„ ํ˜ธ๋œ๋‹คโ€๋ผ๋Š” pairwise ์ˆœ์œ„ ๊ฐ€์ •์„ ์ตœ๋Œ€๋กœ ๋งŒ์กฑํ•˜๋„๋ก ํŒŒ๋ผ๋ฏธํ„ฐ ฮ˜๋ฅผ ํ•™์Šต
  • ์ด๊ณ  ๊ธฐ๋ณธ๊ฐ’์œผ๋กœ L2 regularization
  • ํŠน์ง•
    • AUC(Area Under ROC Curve) ์™€ ๋ฐ€์ ‘: ์œ„ ๋ชฉ์ ์‹์€ ๊ธฐ๋Œ€ AUC๋ฅผ ์ง์ ‘ ์ตœ๋Œ€ํ™”ํ•˜๋Š” log likelihood
    • Smooth & Differentiable: sigmoid ๊ธฐ๋ฐ˜์ด๋ผ hingeยทranking loss ๋Œ€๋น„ ๋ถ€๋“œ๋Ÿฝ๊ณ  ๋ฏธ๋ถ„ ๊ฐ€๋Šฅ
    • Implicit feedback ์นœํ™”์ : explicit feedback์ด ์—†์–ด๋„ ํ•™์Šต ๊ฐ€๋Šฅ
    • Pairwise Sampling ํ•„์š”: ํ•™์Šต ๋ฏธ๋‹ˆ๋ฐฐ์น˜๋งˆ๋‹ค ์‚ผ์ค‘์Œ์„ ๋ฌด์ž‘์œ„๋กœ ์ƒ˜ํ”Œ๋งํ•˜์—ฌ SGD ์—…๋ฐ์ดํŠธ
Multiple Hops

์ข€ ๋” memory network ์„ฑ๋Šฅ ๊ฐœ์„ ์„ ์œ„ํ•ด ์—ฌ๋Ÿฌ ๋ฒˆ ๋ฐ˜๋ณต

  • ์ฒซ๋ฒˆ์งธ attention์˜ ๊ฒฐ๊ณผ์ธ ๊ณผ ์„ MLP๋กœ ๊ฒฐํ•ฉํ•ด์„œ ๋‹ค์Œ hop์˜ input ์ƒ์„ฑ
๋‹ค๋ฅธ ๋ชจ๋ธ๊ณผ์˜ ๊ด€๊ณ„
  1. Latent Factor Model

    • rating ํ–‰๋ ฌ์„ ์ € ์ฐจ์› ํ–‰๋ ฌ์˜ ๊ณฑ์œผ๋กœ ํ‘œํ˜„ํ•ด์„œ ์ˆจ๊ฒจ์ง„ ๊ด€๊ณ„ ๋ฐœ๊ฒฌ
    • CMN์—์„œ ์ด์›ƒ ์ •๋ณด ์ฒ˜๋ฆฌํ•˜๋Š” ๋ถ€๋ถ„๊ณผ MLP, activation function ๋‹จ์ˆœํ™” ํ•˜๋ฉด GMF์™€ ๋™์ผ
  2. Neighborhood-based Similarity Model

    • ๋ชฉ์  : user-user similarity ํ–‰๋ ฌ ์ถ”์ •
    • memory module์ด similarity ํ–‰๋ ฌ ์—ญํ•  ์ˆ˜ํ–‰
  3. Hybrid Model

    • SVD ++ ๋Š” 1, 2๋ฅผ ํ•ฉ์นœ ๋ฐฉ์‹์œผ๋กœ ๋™์ž‘
    • MLP๋ž‘ activation function๋งŒ ์ž˜ ์ฒ˜๋ฆฌํ•˜๋ฉด ๋™์ผ

Footnotes

  1. https://velog.io/@zxxzx1515/๋…ผ๋ฌธ-๋ฆฌ๋ทฐ-BPR-Bayesian-Personalized-Ranking-from-Implicit-Feedback โ†ฉ