TL;DR


๋ฐ์ดํ„ฐ ํ˜•ํƒœ

  • supervised learning์„ ์œ„ํ•ด์„œ ์˜ ํ˜•ํƒœ๋กœ ๋งŒ๋“ค์–ด์•ผ ํ•จ
  • โ€œUser 5๊ฐ€ Item 4๋ฅผ ๋ณด๊ณ  Rating 4๋ฅผ ์ค€ ๋ฐ์ดํ„ฐโ€์˜ ๊ฒฝ์šฐ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ํ‘œํ˜„ ๊ฐ€๋Šฅ (์ „์ฒด User 10๋ช…, ์ „์ฒด Item 5๊ฐœ ๊ฐ€์ •)

๊ธฐ์กด ๋ชจ๋ธ๋“ค์˜ ๋ฌธ์ œ

  • ๊ธฐ๋ณธ์ ์œผ๋กœ ์ถ”์ฒœ์‹œ์Šคํ…œ์—์„œ ํ™œ์šฉํ•˜๋Š” ๋ฐ์ดํ„ฐ๋Š” ๋งŽ์€ ๋ฒ”์ฃผํ˜• ๋ณ€์ˆ˜๋ฅผ ๋‹ค๋ฃจ๊ธฐ ๋•Œ๋ฌธ์— ์— 0 ๋งŽ์€ sparseํ•œ ์ƒํƒœ
  • ํฌ๊ฒŒ SVM ๋ชจ๋ธ๊ณผ Factorization ๋ชจ๋ธ๋“ค์ด ์žˆ๋Š”๋ฐ ๊ฐ๊ฐ ํ•œ๊ณ„๊ฐ€ ์žˆ์Œ
SVM models
  1. Linear Kernel
    • user ,item์˜ bias๋งŒ ๊ณ ๋ คํ•œ ์•„์ฃผ ๊ธฐ๋ณธ์ ์ธ CF ๋ชจํ˜•์œผ๋กœ ์ƒ๊ฐํ•  ์ˆ˜ ์žˆ์Œ
    • ๋งค์šฐ ๊ฐ„๋‹จํ•ด์„œ sparseํ•ด๋„ ์ถ”์ • ์ž˜๋˜๊ธดํ•˜๋Š”๋ฐ ๋‹น์—ฐํžˆ ์„ฑ๋Šฅ์€ ์•ˆ์ข‹์Œ
  2. Polynomial Kernel
    • : symmetric matrix
    • ๋ชจ๋“  ์ƒํ˜ธ์ž‘์šฉ ํŒŒ๋ผ๋ฏธํ„ฐ ๋ฅผ ๋…๋ฆฝ์œผ๋กœ ์ทจ๊ธ‰
      • ๊ฐ€ ์—…๋ฐ์ดํŠธ๋˜๋ ค๋ฉด User u๊ฐ€ Item i๋ฅผ ํ‰๊ฐ€ํ•œ ํ›ˆ๋ จ๋ฐ์ดํ„ฐ๊ฐ€ ์žˆ์–ด์•ผ ํ•จ
      • ๋”ฐ๋ผ์„œ Sparseํ•œ ์ƒํ™ฉ์—์„œ non-linear ์ƒํ˜ธ์ž‘์šฉ์„ ํ•™์Šตํ•˜๊ธฐ๊ฐ€ ์–ด๋ ค์›€
Factorization models
  • Sparseํ•œ User - Item Interaction ํ–‰๋ ฌ์ด input์ด๋ผ์„œ ์ผ๋ฐ˜์ ์ธ ์‹ค์ˆ˜ feature vector ์‚ฌ์šฉํ•  ์ˆ˜ ์—†์Œ
  • User, Item ์ด์™ธ์— ์ƒˆ๋กœ์šด feature๋ฅผ ๋„ฃ๊ณ  ์‹ถ์œผ๋ฉด ๋ฒˆ๊ฑฐ๋กœ์›€

Factorization Machine

  • high sparsity ์—์„œ๋„ ๋ฏฟ์„๋งŒํ•œ ํŒŒ๋ผ๋ฏธํ„ฐ ์ถ”์ •์ด ๊ฐ€๋Šฅํ•˜๋ฉฐ SVM์ฒ˜๋Ÿผ ์ผ๋ฐ˜์ ์ธ predictor๋กœ ์‚ฌ์šฉ๊ฐ€๋Šฅํ•œ ๋ชจ๋ธ
์žฅ์ 
  1. Sparseํ•œ ๋ฐ์ดํ„ฐ์—์„œ non-linear ๊ด€๊ณ„ ํŒŒ๋ผ๋ฏธํ„ฐ ์ถ”์ • ๊ฐ€๋Šฅ
    • polynomial kernel SVM ์ฒ˜๋Ÿผ ๋ชจ๋“  nested variable interaction ๋ชจ๋ธ๋ง์„ ํ•  ์ˆ˜ ์žˆ๋Š” ๊ฒƒ์€ ๋™์ผ
    • ํ•˜์ง€๋งŒ scalar ํŒŒ๋ผ๋ฏธํ„ฐ ๋ฅผ ๋‘๋Š” ๊ฒƒ์ด ์•„๋‹Œ MF์ฒ˜๋Ÿผ ๋ฒกํ„ฐ์˜ ๋‚ด์ ์œผ๋กœ ํ‘œํ˜„ํ•œ โ€˜factorized parameterizationโ€™ ์‚ฌ์šฉ
  2. Linear time์— ๊ณ„์‚ฐ๊ฐ€๋Šฅํ•˜๋ฉฐ linear number of parameter ๋ณด์œ 
  3. ์ž„์˜์˜ ์‹ค์ˆ˜ feature vector ์‚ฌ์šฉ ๊ฐ€๋Šฅ
    • feature vector ์ž˜ ์กฐ์ •ํ•˜๋ฉด ๋‹ค์–‘ํ•œ ๋ชจ๋ธ ํ‘œํ˜„ ๊ฐ€๋Šฅํ•˜๋ฉฐ ์‹ค์ œ๋กœ ์—ฌ๋Ÿฌ CF ๋ชจ๋ธ์„ ์ผ๋ฐ˜ํ™”ํ•œ ๋ชจ๋ธ
์˜ˆ์‹œ ๋ฐ์ดํ„ฐ ํ˜•ํƒœ
  • 7๊ฐœ์˜ ๋ฐ์ดํ„ฐ

    • #1 : User โ€˜Bโ€™๊ฐ€ Movie โ€˜SWโ€™๋ฅผ Time โ€˜5โ€™์— ์‹œ์ฒญ โ†’ ๋ณ„์  4
    • #2 : User โ€˜Bโ€™๊ฐ€ Movie โ€˜STโ€™๋ฅผ Time โ€˜8โ€™์— ์‹œ์ฒญ โ†’ ๋ณ„์  5
    • #3 : User โ€˜Cโ€™๊ฐ€ Movie โ€˜TIโ€™๋ฅผ Time โ€˜9โ€™์— ์‹œ์ฒญ โ†’ ๋ณ„์  1
    • #4 : User โ€˜Cโ€™๊ฐ€ Movie โ€˜SWโ€™๋ฅผ Time โ€˜12โ€™์— ์‹œ์ฒญ โ†’ ๋ณ„์  5
    • #5 : User โ€˜Aโ€™๊ฐ€ Movie โ€˜TIโ€™๋ฅผ Time โ€˜13โ€™์— ์‹œ์ฒญ โ†’ ๋ณ„์  5
    • #6 : User โ€˜Aโ€™๊ฐ€ Movie โ€˜NHโ€™๋ฅผ Time โ€˜14โ€™์— ์‹œ์ฒญ โ†’ ๋ณ„์  3
    • #7 : User โ€˜Aโ€™๊ฐ€ Movie โ€˜SWโ€™๋ฅผ Time โ€˜16โ€™์— ์‹œ์ฒญ โ†’ ๋ณ„์  1
  • ๊ธฐ๋ณธ์ ์œผ๋กœ User, Item์€ ์‚ฌ์šฉํ•˜๊ณ , ๊ทธ ์™ธ์˜ Auxiliary Features๋Š” ์ž์œ ๋กญ๊ฒŒ ์‚ฌ์šฉ โ†’ ์ด๊ฒƒ์ด FM์˜ ์žฅ์ 

    • User: one hot encoded ์œ ์ €
    • Movies: ํ•ด๋‹น ์œ ์ €๊ฐ€ ํ‰๊ฐ€ํ•œ one hot encoded ์˜ํ™”
    • Time: ๋ฐ์ดํ„ฐ ๋“ค์–ด์˜จ ์‹œ์ 
    • Other Movies rated: ๊ทธ ์œ ์ €๊ฐ€ ๋ณธ ๋ชจ๋“  ์˜ํ™” ํ‘œ์‹œ
    • Last Movie rated: ์ง์ „์— ํ‰๊ฐ€ํ•œ ์˜ํ™”
2-way Factorization Model
  • Notation
    • : data sample์˜ feature ๊ฐœ์ˆ˜. ์ด interaction ๋ฐ์ดํ„ฐ ๊ฐœ์ˆ˜๊ฐ€ ์•„๋‹ˆ๊ณ  ์œ„์˜ ๊ทธ๋ฆผ์—์„œ column์˜ ๊ธธ์ด
    • : ith feature
    • ์ถ”์ •ํ•ด์•ผํ•  ํŒŒ๋ผ๋ฏธํ„ฐ๋Š”
      • : global bias
      • : i ๋ฒˆ์งธ ๋ณ€์ˆ˜์˜ strength
      • : ith, jth ๋ณ€์ˆ˜๊ฐ„์˜ interaction โ†’ sparsityํ•˜์—์„œ ๊ณ ์ฐจ์› ์ƒํ˜ธ์ž‘์šฉ ํŒŒ๋ผ๋ฏธํ„ฐ ์ถ”์ •์˜ ํ•ต์‹ฌ
        • : dot product
  • 2-way FM (d=2)์€ ๋ชจ๋“  single, pairwise interaction ์žก์•„๋ƒ„
Training ์˜ˆ์‹œ : ์ „์ฒด User 10๋ช…, Item 5๊ฐœ์ธ ์ƒํ™ฉ์—์„œ User ID 5๊ฐ€ Item ID 4์— Rating 4๋ฅผ ๋ถ€์—ฌ
  • User์™€ Item ๋งŒ ํ™œ์šฉํ•˜๋Š” ๊ฐ„๋‹จํ•œ ์˜ˆ์‹œ
  • Training ๊ณผ์ •
    1. Interaction ๋ฐ์ดํ„ฐ๋ฅผ ์œ„์˜ ๊ทธ๋ฆผ ํ˜•ํƒœ๋กœ ์ˆ˜์ •
    2. ์ž„์˜๋กœ ์ดˆ๊ธฐํ™”๋œ ํŒŒ๋ผ๋ฏธํ„ฐ๋“ค๋กœ ๊ณ„์‚ฐ
    3. ์‹ค์ œ Rating๊ณผ์˜ ์ฐจ์ด๋กœ ํŒŒ๋ผ๋ฏธํ„ฐ ์—…๋ฐ์ดํŠธ
  • ์‹ค์ œ๋กœ ์œ„์˜ ์˜ˆ์‹œ์ฒ˜๋Ÿผ User, Item 2๊ฐ€์ง€ feature๋งŒ ์ด์šฉํ•ด์„œ FM์„ ํ•˜๋Š” ๊ฒƒ์€ Matrix Factorization๊ณผ ๋™์ผ
    • polynomial ํ•ญ์—์„œ ๋งŒ 1์ด๊ณ  ๋‚˜๋จธ์ง€๋Š” ๋ชจ๋‘ 0
์™œ Sparseํ•œ ๊ฒฝ์šฐ ๋” ์œ ๋ฆฌํ• ๊นŒ?
  • FM์—์„œ๋Š” factorization์„ ํ†ตํ•ด ๋ฅผ ๊ณต์œ ํ•จ์œผ๋กœ์จ ํŒŒ๋ผ๋ฏธํ„ฐ์˜ ๋…๋ฆฝ์„ฑ์„ ์ œ๊ฑฐ โ†’ sparseํ•จ์—๋„ interaction์„ ์ถ”์ •ํ•  ์ˆ˜ ์žˆ๋Š” ๊ฒƒ
์˜ˆ์‹œ : User-Item๋งŒ ํ™œ์šฉ : User 100๋ช…, Item 20๊ฐœ, ์ž„๋ฒ ๋”ฉ ์ฐจ์› = 4
  • ํ•™์Šตํ•ด์•ผํ•˜๋Š” ํŒŒ๋ผ๋ฏธํ„ฐ (2์ฐจ์‹๋งŒ)
    • FM : โ†’
    • SVM : โ†’
  • ํ•™์Šต ๊ณผ์ • : (user 5, item 4) ๊ด€์ธก ์˜ˆ์‹œ
    • FM โ†’ ๋‘˜ ๋‹ค ๊ฐฑ์‹  โ†’ (user 5, ๋‹ค๋ฅธ item), (๋‹ค๋ฅธ user, item 4) ์˜ˆ์ธก์— ๋ฐ˜์˜
    • SVM โ†’ ๋งŒ ๊ฐฑ์‹ 
Linear Complexity
  • Pairwise Interaction์˜ ์‹œ๊ฐ„๋ณต์žก๋„๋Š” ์œผ๋กœ ๋ณด์ด์ง€๋งŒ ์‹ค์ œ๋กœ๋Š” ์— ๊ณ„์‚ฐ ๊ฐ€๋Šฅ
Training : ์œ„์—์„œ ์œผ๋กœ ๋ณ€ํ˜•ํ•œ ์‹์„ ๋ฏธ๋ถ„ํ•ด์„œ SGD

์ฝ”๋“œ

Info

import torch
from torchfm.layer import FactorizationMachine, FeaturesEmbedding, FeaturesLinear
 
class FactorizationMachineModel(torch.nn.Module):
    """
    A pytorch implementation of Factorization Machine.
 
    Reference:
        S Rendle, Factorization Machines, 2010.
    """
 
    def __init__(self, field_dims, embed_dim):
        super().__init__()
        self.embedding = FeaturesEmbedding(field_dims, embed_dim)
        self.linear = FeaturesLinear(field_dims)
        self.fm = FactorizationMachine(reduce_sum=True)
 
    def forward(self, x):
        """
        :param x: Long tensor of size ``(batch_size, num_fields)``
        """
        x = self.linear(x) + self.fm(self.embedding(x))
        return torch.sigmoid(x.squeeze(1))
 
class FeaturesEmbedding(torch.nn.Module):
 
    def __init__(self, field_dims, embed_dim):
        super().__init__()
        self.embedding = torch.nn.Embedding(sum(field_dims), embed_dim)
        self.offsets = np.array((0, *np.cumsum(field_dims)[:-1]), dtype=np.long)
        torch.nn.init.xavier_uniform_(self.embedding.weight.data)
 
    def forward(self, x):
        """
        :param x: Long tensor of size ``(batch_size, num_fields)``
        """
        x = x + x.new_tensor(self.offsets).unsqueeze(0)
        return self.embedding(x)
        
class FeaturesLinear(torch.nn.Module):
 
    def __init__(self, field_dims, output_dim=1):
        super().__init__()
        self.fc = torch.nn.Embedding(sum(field_dims), output_dim)
        self.bias = torch.nn.Parameter(torch.zeros((output_dim,)))
        self.offsets = np.array((0, *np.cumsum(field_dims)[:-1]), dtype=np.long)
 
    def forward(self, x):
        """
        :param x: Long tensor of size ``(batch_size, num_fields)``
        """
        x = x + x.new_tensor(self.offsets).unsqueeze(0)
        return torch.sum(self.fc(x), dim=1) + self.bias
 
 
class FactorizationMachine(torch.nn.Module):
 
    def __init__(self, reduce_sum=True):
        super().__init__()
        self.reduce_sum = reduce_sum
 
    def forward(self, x):
        """
        :param x: Float tensor of size ``(batch_size, num_fields, embed_dim)``
        """
        # O(kn^2) -> O(kn)
        square_of_sum = torch.sum(x, dim=1) ** 2
        sum_of_square = torch.sum(x ** 2, dim=1)
        ix = square_of_sum - sum_of_square
        if self.reduce_sum:
            ix = torch.sum(ix, dim=1, keepdim=True)
        return 0.5 * ix