An optimization problem is one where you have to make the best decision (choose the best investments, minimize your company’s costs, find the class schedule with the fewest morning classes, or so on). In optimization models then, the words “minimize” and “maximize” come up a lot when articulating an objective.
In data science, many of the practices, whether that’s artificial intelligence, data mining, or forecasting, are actually just some data prep plus a model-fitting step that’s actually an optimization model. We’ll start with a little practice with optimization now. Just a taste.
In Excel, optimization problems are solved using an Add-In that ships with Excel called Solver.
On Windows, Solver may be added in by going to File (in Excel 2007 it’s the top left Windows button) > Options > Add-ins, and under the Manage drop-down choosing Excel Add-ins and pressing the Go button. Check the Solver Add-In box and press OK.
On Mac, Solver is added by going to Tools then Add-ins and selecting Solver.xlam from the menu.
A Solver button will appear in the Analysis section of the Data tab in every version.
All right! Now that Solver is installed, here’s an optimization problem: You are told you need 2,400 calories a day. What’s the fewest number of items you can buy from the snack stand to achieve that? Obviously, you could buy 10 ice cream sandwiches at 240 calories a piece, but is there a way to do it for fewer items than that?
Solver can tell you!
To start, download a copy of the Calories spreadsheet from the book’s companion website at www.wiley.com/go/datasmart — use the download link for chapter 1. Make a copy of the Calories sheet in the Concessions.xlsx Excel workbook, name the sheet Calories-Solver, and clear out everything but the calories table on the copy. If you don’t know how to make a copy of a sheet in Excel, you simply right-click the tab you’d like to copy and select the Move or Copy menu.
To get Solver to work, you need to provide it with a range of cells it can set with decisions. In this case, Solver needs to decide how many of each item to buy. So in Column C next to the calorie counts, label the column How many? (or whatever you feel like), and you can allow Solver to store its decisions in this column.
Excel considers blank cells to be 0s so you needn’t fill in these cells with anything to start. Solver will do that for you.
In cell C16, sum up the number of items to be bought above as:
And below that you can sum up the total calorie count of these items (which you’ll want eventually to equal 2,400) using the SUMPRODUCT formula:
Now you’re ready to build the model, so bring up the Solver window by pressing the Solver button on the Data tab.
Note: The Solver window, shown in Figure 1-23 in Excel 2011, looks pretty similar in Excel 2010, 2011, and 2013. In Excel 2007, the layout is slightly different, but the only substantive difference is that there is no algorithm selection box. Rather, there’s an “Assume Linear Model” checkbox under the Options menu. We’ll learn all about these elements later
The main elements you plug into Solver to solve a problem are an objective cell, an optimization direction (minimization or maximization), some decision variables that can be changed by Solver, and some constraints.
In your case, the objective is to minimize the total items in cell C16. The cells that can be altered are the item selections in C2:C15. And the constraints are that C17, the total calories, needs to be equal to 2,400. Also, we’ll need to add a constraint that our decisions be counting numbers, so we’ll need to check the non-negative box (under the options menu in Excel 2007) and add an integer constraint to the decisions. After all, you can’t buy 1.7 sodas.
To add in the total calorie constraint, press the Add button and set C17 equal to 2,400.
Similarly, add a constraint setting C2:C15 to be integers as shown in Figure 1-25 below.
In Excel 2010, 2011, and 2013, make sure the solving method is set to Simplex LP. Simplex LP is appropriate for this problem, because this problem is linear (the “L” in LP stands for linear). By linear, I mean that the problem involves nothing but linear combinations of the decisions in C2 through C15 (sums, products with constants such as calorie counts, etc.).