Rumale (Ruby machine learning) is a machine learning library in Ruby
Rumale provides machine learning algorithms with interfaces similar to Scikit-Learn in Python
Rumale supports Support Vector Machine,
Logistic Regression, Ridge, Lasso,
Multi-layer Perceptron,
Naive Bayes, Decision Tree, Gradient Tree Boosting, Random Forest,
K-Means, Gaussian Mixture Model, DBSCAN, Spectral Clustering,
Mutidimensional Scaling, t-SNE,
Fisher Discriminant Analysis, Neighbourhood Component Analysis,
Principal Component Analysis, Non-negative Matrix Factorization,
and many other algorithms.
Rumale::LinearModel::LinearRegression.new(fit_bias: # (Boolean) — The flag indicating whether to fit the bias term.bias_scale: # (Float) — The scale of the bias term.max_iter: # (Integer) — The maximum number of iterations.batch_size: # (Integer) — The size of the mini batches.optimizer: # (Optimizer) — The optimizer to calculate adaptive learning rate. If nil is given, Nadam is used.random_seed: # (Integer) — The seed value using to initialize the random generator.)
Rumale::LinearModel::Ridge.new(reg_param: # (Float) — The regularization parameter.fit_bias: # (Boolean) — The flag indicating whether to fit the bias term.bias_scale: # (Float) — The scale of the bias term.max_iter: # (Integer) — The maximum number of iterations.batch_size: # (Integer) — The size of the mini batches.optimizer: # (Optimizer) — The optimizer to calculate adaptive learning rate. If nil is given, Nadam is used.random_seed: # (Integer) — The seed value using to initialize the random generator.)
Lasso Regression
L1 regularization
Rumale::LinearModel::Lasso.new(reg_param: # (Float) — The regularization parameter.fit_bias: # (Boolean) — The flag indicating whether to fit the bias term.bias_scale: # (Float) — The scale of the bias term.max_iter: # (Integer) — The maximum number of iterations.batch_size: # (Integer) — The size of the mini batches.optimizer: # (Optimizer) — The optimizer to calculate adaptive learning rate. If nil is given, Nadam is used.random_seed: # (Integer) — The seed value using to initialize the random generator.)
Logistic Regression
Rumale::LinearModel::LogisticRegression.new(reg_param: # (Float) — The regularization parameter.fit_bias: # (Boolean) — The flag indicating whether to fit the bias term.bias_scale: # (Float) — The scale of the bias term. If fit_bias is true, the feature vector v becoms [v; bias_scale].max_iter: # (Integer) — The maximum number of iterations.batch_size: # (Integer) — The size of the mini batches.optimizer: # (Optimizer) — The optimizer to calculate adaptive learning rate. If nil is given, Nadam is used.random_seed: # (Integer) — The seed value using to initialize the random generator.)
Support Vector Machine
svc=Rumale::LinearModel::SVC.new(reg_param: # (Float) — The regularization parameter.fit_bias: # (Boolean) — The flag indicating whether to fit the bias term.bias_scale: # (Float) — The scale of the bias term.max_iter: # (Integer) — The maximum number of iterations.batch_size: # (Integer) — The size of the mini batches.probability: # (Boolean) — The flag indicating whether to perform probability estimation.optimizer: # (Optimizer) — The optimizer to calculate adaptive learning rate. If nil is given, Nadam is used.random_seed: # (Integer) — The seed value using to initialize the random generator.)
Rumale::Tree::DecisionTreeClassifier.new(criterion: # (String) — The function to evaluate spliting point. Supported criteria are ‘gini’ and ‘entropy’.max_depth: # (Integer) — The maximum depth of the tree. If nil is given, decision tree grows without concern for depth.max_leaf_nodes: # (Integer) — The maximum number of leaves on decision tree. If nil is given, number of leaves is not limited.min_samples_leaf: # (Integer) — The minimum number of samples at a leaf node.max_features: # (Integer) — The number of features to consider when searching optimal split point. If nil is given, split process considers all features.random_seed: # (Integer) — The seed value using to initialize the random generator. It is used to randomly determine the order of features when deciding spliting point.)
Rumale::Tree::DecisionTreeRegressor.new(criterion: # (String) —The function to evaluate spliting point. Supported criteria are ‘mae’ and ‘mse’.max_depth: # (Integer) —The maximum depth of the tree. If nil is given, decision tree grows without concern for depth.max_leaf_nodes: # (Integer) —The maximum number of leaves on decision tree. If nil is given, number of leaves is not limited.min_samples_leaf: # (Integer) —The minimum number of samples at a leaf node.max_features: # (Integer) —The number of features to consider when searching optimal split point. If nil is given, split process considers all features.random_seed: # (Integer) —The seed value using to initialize the random generator. It is used to randomly determine the order of features when deciding spliting point.)
ExtraTree
Random Forest
Rumale::Ensemble::RandomForestClassifier.new(n_estimators: # (Integer) —The numeber of decision trees for contructing random forest.criterion: # (String) —The function to evalue spliting point. Supported criteria are ‘gini’ and ‘entropy’.max_depth: # (Integer) —The maximum depth of the tree. If nil is given, decision tree grows without concern for depth.max_leaf_nodes: # (Integer) —The maximum number of leaves on decision tree. If nil is given, number of leaves is not limited.min_samples_leaf: # (Integer) —The minimum number of samples at a leaf node.max_features: # (Integer) —The number of features to consider when searching optimal split point. If nil is given, split process considers all features.random_seed: # (Integer) —The seed value using to initialize the random generator. It is used to randomly determine the order of features when deciding spliting point.)
Rumale::Ensemble::RandomForestRegressor.new(n_estimators: # (Integer) —The numeber of decision trees for contructing random forest.criterion: # (String) —The function to evalue spliting point. Supported criteria are ‘gini’ and ‘entropy’.max_depth: # (Integer) —The maximum depth of the tree. If nil is given, decision tree grows without concern for depth.max_leaf_nodes: # (Integer) —The maximum number of leaves on decision tree. If nil is given, number of leaves is not limited.min_samples_leaf: # (Integer) —The minimum number of samples at a leaf node.max_features: # (Integer) —The number of features to consider when searching optimal split point. If nil is given, split process considers all features.random_seed: # (Integer) —The seed value using to initialize the random generator. It is used to randomly determine the order of features when deciding spliting point.)
AdaBoost (Adaptive Boosting)
Rumale::Ensemble::AdaBoostClassifier.new(n_estimators: # (Integer) —The numeber of decision trees for contructing random forest.criterion: # (String) —The function to evalue spliting point. Supported criteria are ‘gini’ and ‘entropy’.max_depth: # (Integer) —The maximum depth of the tree. If nil is given, decision tree grows without concern for depth.max_leaf_nodes: # (Integer) —The maximum number of leaves on decision tree. If nil is given, number of leaves is not limited.min_samples_leaf: # (Integer) —The minimum number of samples at a leaf node.max_features: # (Integer) —The number of features to consider when searching optimal split point. If nil is given, split process considers all features.random_seed: # (Integer) —The seed value using to initialize the random generator. It is used to randomly determine the order of features when deciding spliting point.)
Rumale::Ensemble::AdaBoostRegressor.new(n_estimators: # (Integer) —The numeber of decision trees for contructing random forest.threshold: # (Float) —The threshold for delimiting correct and incorrect predictions. That is constrained to [0, 1]exponent: # (Float) —The exponent for the weight of each weak learner.criterion: # (String) —The function to evalue spliting point. Supported criteria are ‘gini’ and ‘entropy’.max_depth: # (Integer) —The maximum depth of the tree. If nil is given, decision tree grows without concern for depth.max_leaf_nodes: # (Integer) —The maximum number of leaves on decision tree. If nil is given, number of leaves is not limited.min_samples_leaf: # (Integer) —The minimum number of samples at a leaf node.max_features: # (Integer) —The number of features to consider when searching optimal split point. If nil is given, split process considers all features.random_seed: # (Integer) —The seed value using to initialize the random generator. It is used to randomly determine the order of features when deciding spliting point.)
Unsupervised Learning Estimators
PCA (Principal component analysis)
Rumale::Decomposition::PCA.new(n_components: # (Integer) —The number of principal components.max_iter: # (Integer) —The maximum number of iterations.tol: # (Float) —The tolerance of termination criterion.random_seed: # (Integer) —The seed value using to initialize the random generator.)
NMF (Non-negative matrix factorization)
Rumale::Decomposition::NMF.new(n_components: # (Integer) —The number of components. max_iter: # (Integer) —The maximum number of iterations. tol: # (Float) —The tolerance of termination criterion. eps: # (Float) —A small value close to zero to avoid zero division error. random_seed: # (Integer) —The seed value using to initialize the random generator. )
Rumale::Manifold::TSNE.new(n_components: # (Integer) —The number of dimensions on representation space.perplexity: # (Float) —The effective number of neighbors for each point. Perplexity are typically set from 5 to 50.metric: # (String) —The metric to calculate the distances in original space. If metric is 'euclidean', Euclidean distance is calculated for distance in original space. If metric is 'precomputed', the fit and fit_transform methods expect to be given a distance matrix.init: # (String) —The init is a method to initialize the representaion space. If init is 'random', the representaion space is initialized with normal random variables. If init is 'pca', the result of principal component analysis as the initial value of the representation space.max_iter: # (Integer) —The maximum number of iterations.tol: # (Float) —The tolerance of KL-divergence for terminating optimization. If tol is nil, it does not use KL divergence as a criterion for terminating the optimization.verbose: # (Boolean) —The flag indicating whether to output KL divergence during iteration.random_seed: # (Integer) —The seed value using to initialize the random generator.)
KMeans clustering
Rumale::Clustering::KMeans.new(n_clusters: # (Integer) —The number of clusters.init: # (String) —The initialization method for centroids (‘random’ or ‘k-means++’).max_iter: # (Integer) —The maximum number of iterations.tol: # (Float) —The tolerance of termination criterion.random_seed: # (Integer) —The seed value using to initialize the random generator.)
DBSCAN (Density-based spatial clustering of applications with noise)
Rumale::Clustering::DBSCAN.new(eps: # (Float) —The radius of neighborhood.min_samples: # (Integer) —The number of neighbor samples to be used for the criterion whether a point is a core point.)