Models
Popularity Recommender
The popularity-based recommender system recommends the same items to all users, ranked from greatest to least in terms of popularity (i.e., how many interactions each item has received).
- class models.popularity.PopularityRecommender(num_users=None, num_items=None, user_representation=None, item_representation=None, actual_user_representation=None, actual_item_representation=None, verbose=False, num_items_per_iter=10, **kwargs)[source]
A customizable popularity recommendation system.
With the popularity recommender system, users are presented items that are popular in the system. The popularity of an item is measured by the number of times users interacted with that item in the past. In this implementation, items do not expire and, therefore, the system does not base its choice on how recent the items are.
Item attributes are represented by a \(1\times|I|\) array, where \(|I|\) is the number of items in the system. This array stores the number of user interactions for each item.
User profiles are represented by a \(|U|\times 1\) matrix, where \(|U|\) is the number of users in the system. All elements of this matrix are equal to 1, as the predictions of the system are solely based on the item attributes.
- Parameters
num_users (int, default 100) – The number of users \(|U|\) in the system.
num_items (int, default 1250) – The number of items \(|I|\) in the system.
item_representation (
numpy.ndarray
, optional) – A \(|A|\times|I|\) matrix representing the similarity between each item and attribute. If this is not None, num_items is ignored.user_representation (
numpy.ndarray
, optional) – A \(|U|\times|A|\) matrix representing the similarity between each item and attribute, as interpreted by the system. If this is not None, num_users is ignored.actual_user_representation (
numpy.ndarray
orUsers
, optional) – Either a \(|U|\times|T|\) matrix representing the real user profiles, where \(T\) is the number of attributes in the real underlying user profile, or a Users object that contains the real user profiles or real user-item scores. This matrix is not used for recommendations. This is only kept for measurements and the system is unaware of it.actual_item_representation (
numpy.ndarray
, optional) – A \(|T|\times|I|\) matrix representing the real user profiles, where \(T\) is the number of attributes in the real underlying item profile. This matrix is not used for recommendations. This is only kept for measurements and the system is unaware of it.verbose (bool, default False) – If
True
, enables verbose mode. Disabled by default.num_items_per_iter (int, default 10) – Number of items presented to the user per iteration.
- Inherited from BaseRecommender
- Type
Examples
PopularityRecommender can be instantiated with no arguments – in which case, it will be initialized with the default parameters.
>>> pr = PopularityRecommender() >>> pr.users_hat.shape (100, 1) # <-- 100 users (default) >>> pr.items.shape (1, 1250) # <-- 1250 items (default)
This class can be customized by defining the number of users and/or items in the system.
>>> pr = PopularityRecommender(num_users=1200, num_items=5000) >>> pr.users_hat.shape (1200, 1) # <-- 1200 users >>> pr.items.shape (1, 5000)
Or by generating representations for items (user representation can also be defined, but they should always be set to all ones). In the example below, items are uniformly distributed and have had between 0 and 10 interactions each.
>>> item_representation = np.random.randint(11, size=(1, 200)) >>> pr = PopularityRecommender(item_representation=item_representation) >>> pr.items.shape (1, 200) >>> pr.users_hat.shape (100, 1)
Note that all arguments passed in at initialization must be consistent - otherwise, an error is thrown. For example, one cannot pass in
num_users=200
but haveuser_representation.shape
be (300, 1). Likewise, one cannot pass innum_items=1000
but haveitem_representation.shape
be(1, 500)
.
Content Filtering Recommender
The content filtering recommender system attempts to match users to items based on highest predicted inner product between the predicted user profile and predicted item profile. The predictions of user and item profiles are generated iteratively as users interact with items.
- class models.content.ContentFiltering(num_users=None, num_items=None, num_attributes=None, user_representation=None, item_representation=None, actual_user_representation=None, actual_item_representation=None, probabilistic_recommendations=False, seed=None, num_items_per_iter=10, **kwargs)[source]
A customizable content-filtering recommendation system.
With content filtering, items and users are represented by a set of attributes A. This class assumes that the attributes used for items and users are the same. The recommendation system matches users to items with similar attributes.
Item attributes are represented by a \(|A|\times|I|\) matrix, where \(|I|\) is the number of items in the system. For each item, we define the similarity to each attribute.
User profiles are represented by a \(|U|\times|A|\) matrix, where \(|U|\) is the number of users in the system. For each user, we define the similarity to each attribute.
- Parameters
num_users (int, default 100) – The number of users \(|U|\) in the system.
num_items (int, default 1250) – The number of items \(|I|\) in the system.
num_attributes (int, default 1000) – The number of attributes \(|A|\) in the system.
user_representation (
numpy.ndarray
, optional) – A \(|U|\times|A|\) matrix representing the similarity between each item and attribute, as interpreted by the system.item_representation (
numpy.ndarray
, optional) – A \(|A|\times|I|\) matrix representing the similarity between each item and attribute.actual_user_representation (
numpy.ndarray
orUsers
, optional) – Either a \(|U|\times|T|\) matrix representing the real user profiles, where \(T\) is the number of attributes in the real underlying user profile, or a Users object that contains the real user profiles or real user-item scores. This matrix is not used for recommendations. This is only kept for measurements and the system is unaware of it.actual_item_representation (
numpy.ndarray
, optional) – A \(|T|\times|I|\) matrix representing the real user profiles, where \(T\) is the number of attributes in the real underlying item profile. This matrix is not used for recommendations. This is only kept for measurements and the system is unaware of it.num_items_per_iter (int, default 10) – Number of items presented to the user per iteration.
seed (int, optional) – Seed for random generator.
- Inherited from BaseRecommender
- Type
Examples
ContentFiltering can be instantiated with no arguments – in which case, it will be initialized with the default parameters and the item/user representations will be assigned randomly.
>>> cf = ContentFiltering() >>> cf.users_hat.shape (100, 1000) # <-- 100 users (default), 1000 attributes (default) >>> cf.items.shape (1000, 1250) # <-- 1000 attributes (default), 1250 items (default)
This class can be customized either by defining the number of users/items/attributes in the system.
>>> cf = ContentFiltering(num_users=1200, num_items=5000) >>> cf.users_hat.shape (1200, 1000) # <-- 1200 users, 1000 attributes
>>> cf = ContentFiltering(num_users=1200, num_items=5000, num_attributes=2000) >>> cf.users_hat.shape (1200, 2000) # <-- 1200 users, 2000 attributes
Or by generating representations for items and/or users. In the example below, items are uniformly distributed. We indirectly define 100 attributes by defining the following
item_representation
:>>> items = np.random.randint(0, 1, size=(100, 200)) # Users are represented by a power law distribution. # This representation also uses 100 attributes. >>> power_dist = Distribution(distr_type='powerlaw') >>> users = power_dist.compute(a=1.16, size=(30, 100)).compute() >>> cf = ContentFiltering(item_representation=items, user_representation=users) >>> cf.items.shape (100, 200) >>> cf.users_hat.shape (30, 100)
Note that all arguments passed in at initialization must be consistent - otherwise, an error is thrown. For example, one cannot pass in
num_users=200
but haveuser_representation.shape
be(300, 100)
. Likewise, one cannot pass innum_items=1000
but haveitem_representation.shape
be(100, 500)
.- process_new_items(new_items)[source]
We assume the content filtering system has perfect knowledge of the new items; therefore, when new items are created, we simply return the new item attributes.
- Parameters
new_items (
numpy.ndarray
) – An array of items that represents new items that are being added into the system. Should be \(|A|\times|I|\)
- process_new_users(new_users, **kwargs)[source]
By default, the content filtering system assumes the predicted user profiles are zero vectors. (Note that this effectively corresponds to providing random recommendations to each user).
- Parameters
new_users (
numpy.ndarray
) – An array of users that represents new users that are being added into the system. Should be of dimension \(|U|\times|A|\)
Bass Diffusion Model
Bass Model for modeling the spread of infection. This can be applied to studying virality in online communications.
- class models.bass.BassModel(num_users=None, num_items=None, infection_state=None, infection_thresholds=None, item_representation=None, user_representation=None, actual_user_representation=None, actual_item_representation=None, measurements=None, num_items_per_iter=1, seed=None, **kwargs)[source]
Bass model that, for now, only supports one item at a time.
In this model, individuals are “infected” by an item, and then infect their susceptible (i.e., not yet “infected”) contacts independently with a given infection probability. Contacts between users are modeled with an adjacency graph that is \(|U|\times|U|\). The model stores state about which users are infected with \(|U|\times|I|\) matrix, where \(|I|\) is the number of items (currently, this is always equal to 1).
- Parameters
num_users (int, default 100) – The number of users \(|U|\) in the system.
num_items (int, default 1250) – The number of items \(|I|\) in the system.
infection_state (
numpy.ndarray
, optional) – Component that tracks infection state, which is a binary (0/1) array with an element recording whether each user is infected. Should be of dimension \(|U|\times|I|\).infection_thresholds (
numpy.ndarray
, optional) – Component that tracks infection thresholds for each user. Should be of dimension \(1\times|U|\).user_representation (
numpy.ndarray
, optional) – A \(|U|\times|A|\) matrix representing the similarity between each item and attribute, as interpreted by the system.item_representation (
numpy.ndarray
, optional) – A \(|A|\times|I|\) matrix representing the similarity between each item and attribute.actual_user_representation (
numpy.ndarray
orUsers
, optional) – Either a \(|U|\times|T|\) matrix representing the real user profiles, where \(T\) is the number of attributes in the real underlying user profile, or a Users object that contains the real user profiles or real user-item scores. This matrix is not used for recommendations. This is only kept for measurements and the system is unaware of it.actual_item_representation (
numpy.ndarray
, optional) – A \(|T|\times|I|\) matrix representing the real user profiles, where \(T\) is the number of attributes in the real underlying item profile. This matrix is not used for recommendations. This is only kept for measurements and the system is unaware of it.num_items_per_iter (int, default 10) – Number of items presented to the user per iteration.
seed (int, optional) – Seed for random generator.
- Inherited from BaseRecommender
- Type
- infection_probabilities(user_profiles, item_attributes)[source]
Calculates the infection probabilities for each user at the current timestep.
- Parameters
user_profiles (
numpy.ndarray
,scipy.sparse.spmatrix
) – First factor of the dot product, which should provide a representation of users.item_attributes (
numpy.ndarray
,scipy.sparse.spmatrix
) – Second factor of the dot product, which should provide a representation of items.
- run(timesteps='until_completion', startup=False, train_between_steps=True)[source]
Overrides run method of parent class
Recommender
, so thatrepeated_items
defaults toTrue
in Bass models.- Parameters
timestep (int, optional) – Number of timesteps for simulation
startup (bool, default False) – If True, it runs the simulation in startup mode (see recommend() and startup_and_train())
train_between_steps (bool, default True) – If True, the model is retrained after each step with the information gathered in the previous step.
repeated_items (bool, default True) – If True, repeated items are allowed in the system – that is, users can interact with the same item more than once. Examples of common instances in which this is useful: infection and network propagation models.
- class models.bass.InfectionState(infection_state=None, verbose=False)[source]
Component that tracks infection state, which is a binary array with an element recording whether each user is infected
- infect_users(user_indices, item_indices)[source]
Update infection state with users who have become newly infected.
- infected_users()[source]
Return indices of users who are currently infected and not recovered.
- Returns
indices – The first element of the tuple returned is a numpy array with the row indices (i.e., user indices) of those infected, and the second element is a numpy array of the column indices (i.e., item indices)
- Return type
- property num_infected
Return number of infected users.
- recovered_users()[source]
Return indices of users who have recovered (and are no longer susceptible to infection).
- Returns
indices – The first element of the tuple returned is a numpy array with the row indices (i.e., user indices) of those recovered, and the second element is a numpy array of the column indices (i.e., item indices)
- Return type
BaseRecommender
BaseRecommender, the foundational class for all recommender systems implementable in our simulation library
- class models.recommender.BaseRecommender(users_hat, items_hat, users, items, num_users, num_items, num_items_per_iter, creators=None, probabilistic_recommendations=False, measurements=None, record_base_state=False, system_state=None, score_fn=<function inner_product>, interleaving_fn=None, verbose=False, seed=None)[source]
Abstract class representing a recommender system.
The attributes and methods in this class can be generalized beyond recommender systems and are currently common to all pre-loaded models.
- Parameters
users_hat (
numpy.ndarray
) – An array representing users. The shape and meaning depends on the implementation of the concrete class.items_hat (
numpy.ndarray
) – An array representing items. The shape and meaning depends on the implementation of the concrete class.users (
numpy.ndarray
orUsers
) – An array representing real user preferences unknown to the system. Shape is \(|U| \times |A|\), where \(|A|\) is the number of attributes and \(|U|\) is the number of users. When a numpy.ndarray is passed in, we assume this represents the user scores, not the users’ actual attribute vectors.items (
numpy.ndarray
orItems
) – An array representing real item attributes unknown to the system. Shape is \(|A|\times|I|\), where \(|I|\) is the number of items and \(|A|\) is the number of attributes.num_users (int) – The number of users in the system.
num_items (int) – The number of items in the system.
num_items_per_iter (int) – Number of items presented to the user at each iteration.
measurements (list) – List of metrics to monitor.
record_base_state (bool (optional, default: False)) – If True, the system will record at each time step its internal representation of users profiles and item profiles, as well as the true user profiles and item profiles. It will also record the predicted user-item scores at each time step.
system_state (list) – List of system state components to monitor.
score_fn (callable) – Function that is used to calculate each user’s predicted scores for each candidate item. The score function should take as input user_profiles and item_attributes.
verbose (bool (optional, default: False)) – If True, it enables verbose mode.
seed (int, optional) – Seed for random generator used
- users_hat
An array representing users, matching user_representation. The shape and meaning depends on the implementation of the concrete class.
- items_hat
An array representing items, matching item_representation. The shape and meaning depends on the implementation of the concrete class.
- Type
- users
An array representing real user preferences. Shape should be \(|U| \times |A|\), and should match items.
- Type
- items
An array representing actual item attributes. Shape should be \(|A| \times |I|\), and should match users.
- Type
- predicted_scores
An array representing the user preferences as perceived by the system. The shape is always \(|U| \times |I|\), where \(|U|\) is the number of users in the system and \(|I|\) is the number of items in the system. The scores are calculated with the dot product of
users_hat
anditems_hat
.- Type
- num_items_per_iter
Number of items presented to the user per iteration. If “all”, then the system will serve recommendations from the set of all items in the system.
- probabilistic_recommendations
When this flag is set to
True
, the recommendations (excluding any random interleaving) will be randomized, meaning that items will be recommended with a probability proportionate to their predicted score, rather than the top k items, as ranked by their predicted score, being recommended.- Type
bool (optional, default: False)
- random_state
- indices
A \(|U| \times |I|\) array representing the past interactions of each user. This keeps track of which items each user has interacted with, so that it won’t be presented to the user again if repeated_items are not allowed.
- Type
numpy.ndarray
- items_shown
A \(|U| \times \text{num_items_per_iter}\) array representing the indices of the items that each user was shown (i.e., their recommendations) from the most recent timestep.
- Type
numpy.ndarray
- interactions
A \(|U| \times 1\) array representing the indices of the items that each user interacted with at the most recent time step.
- Type
numpy.ndarray
- score_fn
Function that is used to calculate each user’s predicted scores for each candidate item. The score function should take as input
user_profiles
anditem_attributes
.- Type
callable
- interleaving_fn
Function that is used to determine the indices of items that will be interleaved into the recommender system’s recommendations. The interleaving function should take as input an integer
k
(representing the number of items to be interleaved in every recommendation set) and a matrixitem_indices
(representing which items are eligible to be interleaved). The function should return a \(|U|\times k\) matrix representing the interleaved items for each user.- Type
callable
- property actual_item_attributes
Property that is an alias for the matrix representation of actual item attributes. Returns a matrix of dimension \(|A^*|\times|I|\), where \(|A^*|\) is the number of attributes the “true” item representation has.
- property actual_user_item_scores
Property that is an alias for the matrix representation of the true user-item score matrix. Returns a matrix of dimension \(|U|\times|I|\).
- property actual_user_profiles
Property that is an alias for the matrix representation of true user profiles. Returns a matrix of dimension \(|U|\times|A^*|\), where \(|A^*|\) is the number of attributes the “true” item/user representation has.
- add_new_item_indices(num_new_items)[source]
Expands the indices matrix to include entries for new items that were created.
- Parameters
(int) (num_new_items) –
iteration (in this) –
- add_new_user_indices(num_new_users)[source]
Expands the indices matrix to include entries for new users that were created.
- Parameters
(int) (num_new_users) –
iteration (in this) –
- add_users(new_users, **kwargs)[source]
Create pool of new users
- Parameters
new_users (
numpy.ndarray
) – An array representing users. Should be of dimension \(|U_n| \times |A|\), where \(|U_n|\) represents the number of new users, and \(|A|\) represents the number of attributes for each user profile.**kwargs – Any additional information about users can be passed through kwargs (see social.py) for an example.
- choose_interleaved_items(k, item_indices)[source]
Chooses k items out of the item set to “interleave” into the system’s recommendations. In this case, we define “interleaving” as a process by which items can be inserted into the set of items shown to the user, in addition to the recommended items that maximize the predicted score. For example, users may want to insert random interleaved items to increase the “exploration” of the recommender system, or may want to ensure that new items are always interleaved into the item set shown to users. NOTE: Currently, there is no guarantee that items that are interleaved are distinct from the recommended items. We do guarantee that within the set of items interleaved for a particular user, there are no repeats.
- Parameters
k (int) – Number of items that should be interleaved in the recommendation set for each user.
item_indices (
numpy.ndarray
) – Array that contains the valid item indices for each user; that is, the indices of items that they have not yet interacted with.
- Returns
interleaved_items
- Return type
numpy.ndarray
- generate_recommendations(k=1, item_indices=None)[source]
Generate recommendations for each user.
- Parameters
k (int, default 1) – Number of items to recommend.
item_indices (
numpy.ndarray
, optional) – A matrix containing the indices of the items each user has not yet interacted with. It is used to ensure that the user is presented with items they have not already interacted with. If None, then the user may be recommended items that they have already interacted with.
- Returns
Recommendations
- Return type
numpy.ndarray
- get_measurements()[source]
Returns all available measurements. For more details, please see the
Measurement
class.- Returns
Monitored measurements
- Return type
- get_system_state()[source]
Return history of system state components stored in the
state_history
of the components stored inSystemStateModule._system_state
.- Returns
System state
- Return type
- initialize_user_scores()[source]
If the Users object does not already have known user-item scores, then we calculate these scores.
- property predicted_item_attributes
Property that is an alias for the matrix representation of predicted item attributes. Returns a matrix of dimension \(|\hat{A}|\times|I|\), where \(|\hat{A}|\) is the number of attributes that the algorithm uses to represent each item and user.
- property predicted_user_item_scores
Property that is an alias for the matrix representation of the RS algorithm’s predicted user-item score matrix. Returns a matrix of dimension \(|U|\times|I|\).
- property predicted_user_profiles
Property that is an alias for the matrix representation of predicted user profiles. Returns a matrix of dimension \(|U|\times|\hat{A}|\), where \(|\hat{A}|\) is the number of attributes that the algorithm uses to represent each item and user.
- process_new_items(new_items)[source]
Creates new item representations based on items that were just created.
Must be defined in the concrete class.
- process_new_users(new_users, **kwargs)[source]
Creates new user representations based on items that were just created.
Must be defined in the concrete class.
- recommend(startup=False, random_items_per_iter=0, vary_random_items_per_iter=False, repeated_items=True)[source]
Implements the recommendation process by combining recommendations and new (random) items.
- Parameters
startup (bool, default False) – If True, the system is in “startup” (exploration) mode and only presents the user with new randomly chosen items. This is done to maximize exploration.
random_items_per_iter (int, default 0) – Number of per-user item recommendations that should be randomly generated. Passing in
self.num_items_per_iter
will result in all recommendations being randomly generated, while passing in0
will result in all recommendations coming from predicted score.vary_random_items_per_iter (bool, default False) – If
True
, then at each timestep, the # of items that are recommended randomly is itself randomly generated between 0 andrandom_items_per_iter
, inclusive.repeated_items (bool, default True) – If
True
, repeated items are allowed in the system – that is, users can interact with the same item more than once.
- Returns
Items – New and recommended items in random order.
- Return type
numpy.ndarray
- run(timesteps=50, startup=False, train_between_steps=True, random_items_per_iter=0, vary_random_items_per_iter=False, repeated_items=True, no_new_items=False, disable_tqdm=False)[source]
Runs simulation for the given timesteps.
- Parameters
timestep (int, default 50) – Number of timesteps for simulation.
startup (bool, default False) – If
True
, it runs the simulation in startup mode (seerecommend()
andstartup_and_train()
)train_between_steps (bool, default True) – If
True
, the model is retrained after each timestep with the information gathered in the previous step.random_items_per_iter (int, default 0) – Number of per-user item recommendations that should be randomly generated. Passing in
self.num_items_per_iter
will result in all recommendations being randomly generated, while passing in0
will result in all recommendations coming from predicted scores.vary_random_items_per_iter (bool, default False) – If
True
, then at each timestep, the # of items that are recommended randomly is itself randomly generated between 0 andrandom_items_per_iter
, inclusive.repeated_items (bool, default True) – If
True
, repeated items are allowed in the system – that is, the system can recommend items to users that they’ve already previously interacted with.no_new_items (bool, default False) – If
True
, then no new items are created during these timesteps. This can be helpful, say, during a “training” period where no new items should be made.
- set_num_items_per_iter(num_items_per_iter)[source]
Change the number of items that will be shown to each user per iteration.
- startup_and_train(timesteps=50, no_new_items=False)[source]
Runs simulation in startup mode by calling
run()
with startup=True. For more information about startup mode, seerun()
andrecommend()
.- Parameters
timesteps (int, default 50) – Number of timesteps for simulation
no_new_items (bool, default False) – If
True
, then no new items are created during these timesteps. This is only relevant when you have itemCreators
. This can be helpful, say, during a “training” period where no new items should be made.
- train()[source]
Updates scores predicted by the system based on the internal state of the recommender system. Under default initialization, it updates
predicted_scores
with a dot product of user and item attributes.- Returns
predicted_scores
- Return type