How to Start Machine Learning with Swift for Apple Devices

The iPhone is Apple’s most famous product bringing huge revenue to the company and its investors. The iPhone brings a huge community to Apple, creating perfect conditions for the company to increase the sales of the other products, like iMac, iMac Pro, MacBook Air, MacBook Pro, Mac Mini, etc.

All of these devices share one thing, the operating system, more precisely the nature of their operating systems. The iPhones and the iPads and the iWatches are running on iOS and their laptops and desktop computers are running on МacOS.

The common things for these operating systems are that the native apps for both of them need to be written in Objective-C or Swift.

In this article, we are going to talk about Swift, or more precisely, how to do Machine Learning with Swift, that will help you produce cutting edge modern apps for the iPhone, MacBook, etc. Here’s how to become Machine Learning Specialist in under 20 hours with a free course.

Wikipedia: “Swift is a general-purpose, multi-paradigm, compiled programming language developed by Apple Inc. and the open-source community, first released in 2014. Swift was developed as a replacement for Apple’s earlier programming language Objective-C, as Objective-C had been largely unchanged since the early 1980s and lacked modern language features.

Swift works with Apple’s Cocoa and Cocoa Touch frameworks, and a key aspect of Swift’s design was the ability to interoperate with the huge body of existing Objective-C code developed for Apple products over the previous decades.

It is built with the open-source LLVM compiler framework and has been included in Xcode since version 6, released in 2014. On Apple platforms, it uses the Objective-C runtime library which allows C, Objective-C, C++, and Swift code to run within one program.”

Don’t forget to read our latest and similar articles:

Create your first Machine Learning project with Swift and TensorFlow for Apple devices

First, we must import the modules.

Copy to Clipboard

We are going to use the iris problem, which if you are familiar with at least a bit of Machine Learning, or just Artificial Intelligence, you’ve probably used it already.

If you are not familiar, here is what it is:

“Imagine you are a botanist seeking an automated way to categorize each iris flower you find. Machine learning provides many algorithms to classify flowers statistically. For instance, a sophisticated machine learning program could classify flowers based on photographs. Our ambitions are more modest-we’re going to classify iris flowers based on the length and width measurements of their sepals and petals.

The Iris genus entails about 300 species, but our program will only classify the following three:

Iris setosa
Iris virginica
Iris versicolor”

Now we are going to download the dataset that we are going to use in this problem. We are going to do that using the download() function that we’ve implemented.

Copy to Clipboard

Syntax Highlighterlet trainDataFilename = "iris_training.csv" download(from:"http://download.tensorflow.org/data/iris_training.csv",to: trainDataFilename)

Next, we are going to see how our data looks.

Copy to Clipboard

Syntax Highlighterlet f = Python.open(trainDataFilename) for _ in 0..<5 { print(Python.next(f).strip()) } print(f.close())

Here is the output:

120,4,setosa,versicolor,virginica
6.4,2.8,5.6,2.2,2
5.0,2.3,3.3,1.0,1
4.9,2.5,4.5,1.7,2
4.9,3.1,1.5,0.1,0
None

From this view of the dataset, notice the following:

The first line is a header containing information about the dataset:
- There are 120 total examples. Each example has four features and one of three possible label names.
Subsequent rows are data records, one *example* per line, where:
- The first four fields are *features*: these are characteristics of an example. Here, the fields hold float numbers representing flower measurements.
- The last column is the *label*: this is the value we want to predict. For this dataset, it’s an integer value of 0, 1, or 2 that corresponds to a flower name.

Copy to Clipboard

Syntax Highlighterlet featureNames = ["sepal_length", "sepal_width", "petal_length", "petal_width"] let labelName = "species" let columnNames = featureNames + [labelName] print("Features: \(featureNames)") print("Label: \(labelName)")

Each label is associated with a string name (for example, “setosa”), but machine learning typically relies on numeric values. The label numbers are mapped to a named representation, such as:

0: Iris setosa
1: Iris versicolor
2: Iris virginica

Next, we are setting out classes, the resulting labels.

Copy to Clipboard

Syntax Highlighterlet classNames = ["Iris setosa", "Iris versicolor", "Iris virginica"]

Here’s the Answer on How to Start Machine Learning With Swift for Apple Devices

Filip Projcheski2020-11-28T17:13:31+01:00November 28th, 2020|0 Comments

You Can Now Learn for FREE: 9 Courses by Google about Artificial Intelligence, Machine Learning and Data Science

Filip Projcheski2020-11-28T17:15:30+01:00November 19th, 2020|0 Comments

Top 40 COMPLETELY FREE Coursera Artificial Intelligence and Computer Science Courses

Filip Projcheski2020-11-28T17:16:45+01:00November 12th, 2020|1 Comment

Become a Creator: Learn More Here

Now, let’s create a format understandable for our model.

Copy to Clipboard

Syntax Highlighterlet batchSize = 32 /// A batch of examples from the iris dataset. struct IrisBatch { /// [batchSize, featureCount] tensor of features. let features: Tensor<Float> /// [batchSize] tensor of labels. let labels: Tensor<Int32> } /// Conform `IrisBatch` to `Collatable` so that we can load it into a `TrainingEpoch`. extension IrisBatch: Collatable { public init<BatchSamples: Collection>(collating samples: BatchSamples) where BatchSamples.Element == Self { /// `IrisBatch`es are collated by stacking their feature and label tensors /// along the batch axis to produce a single feature and label tensor features = Tensor<Float>(stacking: samples.map{$0.features}) labels = Tensor<Int32>(stacking: samples.map{$0.labels}) } }

Next, we are going to create a function that loads the locally downloaded datasets.

Copy to Clipboard

Syntax Highlighter/// Initialize an `IrisBatch` dataset from a CSV file. func loadIrisDatasetFromCSV( contentsOf: String, hasHeader: Bool, featureColumns: [Int], labelColumns: [Int]) -> [IrisBatch] { let np = Python.import("numpy") let featuresNp = np.loadtxt( contentsOf, delimiter: ",", skiprows: hasHeader ? 1 : 0, usecols: featureColumns, dtype: Float.numpyScalarTypes.first!) guard let featuresTensor = Tensor<Float>(numpy: featuresNp) else { // This should never happen, because we construct featuresNp in such a // way that it should be convertible to tensor. fatalError("np.loadtxt result can't be converted to Tensor") } let labelsNp = np.loadtxt( contentsOf, delimiter: ",", skiprows: hasHeader ? 1 : 0, usecols: labelColumns, dtype: Int32.numpyScalarTypes.first!) guard let labelsTensor = Tensor<Int32>(numpy: labelsNp) else { // This should never happen, because we construct labelsNp in such a // way that it should be convertible to tensor. fatalError("np.loadtxt result can't be converted to Tensor") } return zip(featuresTensor.unstacked(), labelsTensor.unstacked()).map{IrisBatch(features: $0.0, labels: $0.1)} }

Now, we load the data using the function we’ve implemented

Copy to Clipboard

Syntax Highlighterlet trainingDataset: [IrisBatch] = loadIrisDatasetFromCSV(contentsOf: trainDataFilename, hasHeader: true, featureColumns: [0, 1, 2, 3], labelColumns: [4]) let trainingEpochs: TrainingEpochs = TrainingEpochs(samples: trainingDataset, batchSize: batchSize) let firstTrainEpoch = trainingEpochs.next()! let firstTrainBatch = firstTrainEpoch.first!.collated let firstTrainFeatures = firstTrainBatch.features let firstTrainLabels = firstTrainBatch.labels print("First batch of features: \(firstTrainFeatures)") print("firstTrainFeatures.shape: \(firstTrainFeatures.shape)") print("First batch of labels: \(firstTrainLabels)") print("firstTrainLabels.shape: \(firstTrainLabels.shape)")

This is the output:

First batch of features: [[6.4, 3.2, 4.5, 1.5],

[5.5, 2.4, 3.7, 1.0],

[4.4, 3.0, 1.3, 0.2],

[6.9, 3.1, 5.1, 2.3],

[6.5, 3.2, 5.1, 2.0],

[5.5, 2.6, 4.4, 1.2],

[6.9, 3.2, 5.7, 2.3],

[6.8, 3.2, 5.9, 2.3],

[6.3, 3.4, 5.6, 2.4],

[6.1, 2.9, 4.7, 1.4],

[6.4, 2.8, 5.6, 2.1],

[4.4, 2.9, 1.4, 0.2],

[5.5, 2.4, 3.8, 1.1],

[4.9, 3.1, 1.5, 0.1],

[5.5, 3.5, 1.3, 0.2],

[6.9, 3.1, 4.9, 1.5],

[5.0, 3.3, 1.4, 0.2],

[6.6, 3.0, 4.4, 1.4],

[6.8, 2.8, 4.8, 1.4],

[6.2, 3.4, 5.4, 2.3],

[5.0, 3.4, 1.6, 0.4],

[7.0, 3.2, 4.7, 1.4],

[5.2, 3.4, 1.4, 0.2],

[6.2, 2.2, 4.5, 1.5],

[5.8, 2.7, 5.1, 1.9],

[5.1, 3.8, 1.5, 0.3],

[5.8, 4.0, 1.2, 0.2],

[7.2, 3.0, 5.8, 1.6],

[5.0, 3.5, 1.6, 0.6],

[4.6, 3.4, 1.4, 0.3],

[5.7, 3.0, 4.2, 1.2],

[5.0, 2.0, 3.5, 1.0]]

firstTrainFeatures.shape: [32, 4]

First batch of labels: [1, 1, 0, 2, 2, 1, 2, 2, 2, 1, 2, 0, 1, 0, 0, 1, 0, 1, 1, 2, 0, 1, 0, 1, 2, 0, 0, 2, 0, 0, 1, 1]

firstTrainLabels.shape: [32]

Notice that the features for the first batchSize examples are grouped together (or batched) into firstTrainFeatures and that the labels for the first batchSize examples are batched into firstTrainLabels.

You can start to see some clusters by plotting a few features from the batch, using Python‘s matplotlib:

Copy to Clipboard

Syntax Highlighterlet firstTrainFeaturesTransposed = firstTrainFeatures.transposed() let petalLengths = firstTrainFeaturesTransposed[2].scalars let sepalLengths = firstTrainFeaturesTransposed[0].scalars plt.scatter(petalLengths, sepalLengths, c: firstTrainLabels.array.scalars) plt.xlabel("Petal length") plt.ylabel("Sepal length") plt.show()

Clusters of features: Learn Machine Learning with Swift

Image 1: Clusters of features

Next, we are going to create a model, named IrisModel, that has three hidden Dense layers.

Copy to Clipboard

Syntax Highlighterimport TensorFlow let hiddenSize: Int = 10 struct IrisModel: Layer { var layer1 = Dense<Float>(inputSize: 4, outputSize: hiddenSize, activation: relu) var layer2 = Dense<Float>(inputSize: hiddenSize, outputSize: hiddenSize, activation: relu) var layer3 = Dense<Float>(inputSize: hiddenSize, outputSize: 3) @differentiable func callAsFunction(_ input: Tensor<Float>) -> Tensor<Float> { return input.sequenced(through: layer1, layer2, layer3) } } var model = IrisModel()

The activation function determines the output shape of each node in the layer. These non-linearities are important-without them, the model would be equivalent to a single layer. There are many available activations, but ReLU is common for hidden layers.

The ideal number of hidden layers and neurons depends on the problem and the dataset. Like many aspects of machine learning, picking the best shape of the neural network requires a mixture of knowledge and experimentation.

As a rule of thumb, increasing the number of hidden layers and neurons typically creates a more powerful model, which requires more data to train effectively.

Next, we are going to use the model on our data.

Copy to Clipboard

Syntax Highlighter// Apply the model to a batch of features. let firstTrainPredictions = model(firstTrainFeatures) print(firstTrainPredictions[0..<5])

The output is:

[[ 2.5031486, -0.22217305, 0.6405254],

[ 2.057041, -0.14333647, 0.5058617],

[ 1.5384593, -0.071346685, 0.36170575],

[ 2.5841222, -0.29190755, 0.691428],

[ 2.4965959, -0.32670006, 0.6901549]]

Also, we have a new private Facebook group where we are going to share some materials that are not going to be published online and will be available for our members only. The members will have early access to every new post we make and share your thoughts, tips, articles and questions. Become part of our private Facebook group now.

Join Laconicml Group

Here are the predictions, converted to probabilities:

Copy to Clipboard

Syntax Highlighterprint(softmax(firstTrainPredictions[0..<5]))

The output is:

[[ 0.8191417, 0.053674366, 0.12718388],

[ 0.75599545, 0.08373507, 0.1602694],

[ 0.6630401, 0.13255922, 0.20440063],

[ 0.82848436, 0.046691786, 0.12482388],

[ 0.81722885, 0.048551414, 0.13421974]]

We can do the predication by class with the following code:

Copy to Clipboard

Syntax Highlighterprint("Prediction: \(firstTrainPredictions.argmax(squeezingAxis: 1))") print(" Labels: \(firstTrainLabels)")

The output is:

Prediction: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

Labels: [1, 1, 0, 2, 2, 1, 2, 2, 2, 1, 2, 0, 1, 0, 0, 1, 0, 1, 1, 2, 0, 1, 0, 1, 2, 0, 0, 2, 0, 0, 1, 1]

These results are good because we haven’t trained our model yet.

Both training and evaluation stages need to calculate the model’s loss. This measures how off a model’s predictions are from the desired label, in other words, how bad the model is performing. We want to minimize, or optimize, this value.

Our model will calculate its loss using the softmaxCrossEntropy(logits:labels:) function which takes the model’s class probability predictions and the desired label and returns the average loss across the examples.

Let’s calculate the loss for the current untrained model:

Copy to Clipboard

The output is:

Loss test: 1.7008568

To optimize the loss, as the situation suggests, we need to create an optimizer. An optimizer applies the computed gradients to the model’s variables to minimize the loss function.

You can think of the loss function as a curved surface (see Figure 3) and we want to find its lowest point by walking around. The gradients point in the direction of the steepest ascent-so we’ll travel the opposite way and move down the hill.

By iteratively calculating the loss and gradient for each batch, we’ll adjust the model during training. Gradually, the model will find the best combination of weights and bias to minimize loss. And the lower the loss, the better the model’s predictions.

Swift for TensorFlow has many optimization algorithms available for training. This model uses the SGD optimizer that implements the stochastic gradient descent (SGD) algorithm.

The learningRate sets the step size to take for each iteration down the hill. This is a hyperparameter that you’ll commonly adjust to achieve better results.

Copy to Clipboard

Syntax Highlighterlet optimizer = SGD(for: model, learningRate: 0.01) Let's use optimizer to take a single gradient descent step. First, we compute the gradient of the loss with respect to the model: let (loss, grads) = valueWithGradient(at: model) { model -> Tensor<Float> in let logits = model(firstTrainFeatures) return softmaxCrossEntropy(logits: logits, labels: firstTrainLabels) } print("Current loss: \(loss)")

The output is:

Current loss: 1.7008568

Next, we pass the gradient that we just calculated to the optimizer, which updates the model’s differentiable variables accordingly:

Copy to Clipboard

Syntax Highlighteroptimizer.update(&model, along: grads) let logitsAfterOneStep = model(firstTrainFeatures) let lossAfterOneStep = softmaxCrossEntropy(logits: logitsAfterOneStep, labels: firstTrainLabels) print("Next loss: \(lossAfterOneStep)")

The output is:

Next loss: 1.5416806

With all the pieces in place, the model is ready for training! A training loop feeds the dataset examples into the model to help it make better predictions. The following code block sets up these training steps:

Iterate over each epoch. An epoch is one pass through the dataset.
Within an epoch, iterate over each batch in the training epoch
Collate the batch and grab its features (x) and label (y).
Using the collated batch’s features, make a prediction, and compare it with the label. Measure the inaccuracy of the prediction and use that to calculate the model’s loss and gradients.
Use gradient descent to update the model’s variables.
Keep track of some stats for visualization.
Repeat for each epoch.

The epochCount variable is the number of times to loop over the dataset collection. Counter-intuitively, training a model longer does not guarantee a better model. epochCount is a hyperparameter that you can tune. Choosing the right number usually requires both experience and experimentation.

Copy to Clipboard

Syntax Highlighterlet epochCount = 500 var trainAccuracyResults: [Float] = [] var trainLossResults: [Float] = [] func accuracy(predictions: Tensor<Int32>, truths: Tensor<Int32>) -> Float { return Tensor<Float>(predictions .== truths).mean().scalarized() } for (epochIndex, epoch) in trainingEpochs.prefix(epochCount).enumerated() { var epochLoss: Float = 0 var epochAccuracy: Float = 0 var batchCount: Int = 0 for batchSamples in epoch { let batch = batchSamples.collated let (loss, grad) = valueWithGradient(at: model) { (model: IrisModel) -> Tensor<Float> in let logits = model(batch.features) return softmaxCrossEntropy(logits: logits, labels: batch.labels) } optimizer.update(&model, along: grad) let logits = model(batch.features) epochAccuracy += accuracy(predictions: logits.argmax(squeezingAxis: 1), truths: batch.labels) epochLoss += loss.scalarized() batchCount += 1 } epochAccuracy /= Float(batchCount) epochLoss /= Float(batchCount) trainAccuracyResults.append(epochAccuracy) trainLossResults.append(epochLoss) if epochIndex % 50 == 0 { print("Epoch \(epochIndex): Loss: \(epochLoss), Accuracy: \(epochAccuracy)") } }

This is the output:

Epoch 0: Loss: 1.2862605, Accuracy: 0.40625

Epoch 50: Loss: 0.54648876, Accuracy: 0.7708333

Epoch 100: Loss: 0.3795154, Accuracy: 0.90625

Epoch 150: Loss: 0.27226007, Accuracy: 0.9479167

Epoch 200: Loss: 0.22806416, Accuracy: 0.9895833

Epoch 250: Loss: 0.16147493, Accuracy: 0.9895833

Epoch 300: Loss: 0.14311497, Accuracy: 0.9895833

Epoch 350: Loss: 0.13072757, Accuracy: 0.9791667

Epoch 400: Loss: 0.11862741, Accuracy: 0.96875

Epoch 450: Loss: 0.08667142, Accuracy: 0.9895833

While it’s helpful to print out the model’s training progress, it’s often more helpful to see this progress. We can create basic charts using Python’s matplotlib module.

Interpreting these charts takes some experience, but you really want to see the loss go down and the accuracy goes up.

Copy to Clipboard

Syntax Highlighterplt.figure(figsize: [12, 8]) let accuracyAxes = plt.subplot(2, 1, 1) accuracyAxes.set_ylabel("Accuracy") accuracyAxes.plot(trainAccuracyResults) let lossAxes = plt.subplot(2, 1, 2) lossAxes.set_ylabel("Loss") lossAxes.set_xlabel("Epoch") lossAxes.plot(trainLossResults) plt.show()

Accuracy and Loss per Epoch: Learn Machine Learning for Apple Devices

Image 2: Accuracy and Loss per Epoch

Here’s the Answer on How to Start Machine Learning With Swift for Apple Devices

Filip Projcheski2020-11-28T17:13:31+01:00November 28th, 2020|0 Comments

How To Do Machine Learning WITHOUT Any Programming Language Using WEKA

Filip Projcheski2020-10-26T16:01:11+01:00October 26th, 2020|0 Comments

How to Become Machine Learning Specialist in Under 20 Hours from This FREE LinkedIn Course

Filip Projcheski2020-11-07T00:45:05+01:00October 9th, 2020|0 Comments

After the training part, we have to continue to the testing part. Evaluating the model is similar to training the model. The biggest difference is the examples come from a separate test set rather than the training set.

To fairly assess a model’s effectiveness, the examples used to evaluate a model must be different from the examples used to train the model.

The setup for the test dataset is similar to the setup for the training dataset.

Copy to Clipboard

Syntax Highlighterlet testDataFilename = "iris_test.csv" download(from: "http://download.tensorflow.org/data/iris_test.csv", to: testDataFilename) let testDataset = loadIrisDatasetFromCSV( contentsOf: testDataFilename, hasHeader: true, featureColumns: [0, 1, 2, 3], labelColumns: [4]).inBatches(of: batchSize)

Unlike the training stage, the model only evaluates a single epoch of the test data. In the following code cell, we iterate over each example in the test set and compare the model’s prediction against the actual label.

This is used to measure the model’s accuracy across the entire test set.

Copy to Clipboard

Syntax Highlighterfor batchSamples in testDataset { let batch = batchSamples.collated let logits = model(batch.features) let predictions = logits.argmax(squeezingAxis: 1) print("Test batch accuracy: \(accuracy(predictions: predictions, truths: batch.labels))") }

The output is:

Test batch accuracy: 0.96666664

We can use the first batch to see if the model is correct.

Copy to Clipboard

Syntax Highlighterlet firstTestBatch = testDataset.first!.collated let firstTestBatchLogits = model(firstTestBatch.features) let firstTestBatchPredictions = firstTestBatchLogits.argmax(squeezingAxis: 1) print(firstTestBatchPredictions) print(firstTestBatch.labels)

The outout is:

[1, 2, 0, 1, 1, 1, 0, 2, 1, 2, 2, 0, 2, 1, 1, 0, 1, 0, 0, 2, 0, 1, 2, 2, 1, 1, 0, 1, 2, 1]

[1, 2, 0, 1, 1, 1, 0, 2, 1, 2, 2, 0, 2, 1, 1, 0, 1, 0, 0, 2, 0, 1, 2, 1, 1, 1, 0, 1, 2, 1]

The Final is to see how our trained model works in real life. We’ve trained a model and demonstrated that it’s good-but not perfect-at classifying iris species. Now let’s use the trained model to make some predictions on unlabeled examples; that is, on examples that contain features but not a label.

In real-life, the unlabeled examples could come from lots of different sources including apps, CSV files, and data feeds. For now, we’re going to manually provide three unlabeled examples to predict their labels. Recall, the label numbers are mapped to a named representation as:

0: Iris setosa
1: Iris versicolor
2: Iris virginica

Copy to Clipboard

Syntax Highlighterlet unlabeledDataset: Tensor<Float> = [[5.1, 3.3, 1.7, 0.5], [5.9, 3.0, 4.2, 1.5], [6.9, 3.1, 5.4, 2.1]] let unlabeledDatasetPredictions = model(unlabeledDataset) for i in 0..<unlabeledDatasetPredictions.shape[0] { let logits = unlabeledDatasetPredictions[i] let classIdx = logits.argmax().scalar! print("Example \(i) prediction: \(classNames[Int(classIdx)]) (\(softmax(logits)))") }

The output is:

Example 0 prediction: Iris setosa ([ 0.9927578, 0.007242134, 1.7388968e-09])Example 1 prediction: Iris versicolor ([0.0024022155, 0.9451288, 0.052468948])Example 2 prediction: Iris virginica ([2.51921e-05, 0.09485072, 0.9051241])

Conclusion

This is just an example of how powerful is Swift paired with TensorFlow. If you are familiar with Swift development, you can pair it up with TensorFlow and make amazing apps for every Apple device.

This sounds very challenging and exciting since there are predictions that the mobile apps market will worth more than $400 billion in 2025, so it is good to start now, while you have time, and be ready to cut a big piece from the pie.

Our article is inspired by TensorFlow.

Like with every post we do, we encourage you to continue learning, trying and creating.