CS 559 - Spring 2017 Homework: Super-resolution in TensorFlow

The goal of the homework is to get familiar with TensorFlow, which is an expressive machine learning / deep learning library that you can use in your project and your research. In addition, the homework will give you some hands-on experience about various deep learning techniques that we have covered in class.

Please read the sections below carefully, and understand the experiments and analyses that you need to perform.

Due date: April 1st, 2017, 23:50 (not a joke). Please submit your report (must be a pdf file), README file and source code files as a single zip archive named "YourBilkentID.zip" to moodle. If you work in a pair, only one file should be submitted for the pair. Each submission should include a README file that provides the name(s) of the authors and a brief summary of contents of other files in the zip archive. Late submissions will not be accepted.

Step 1: Preparations

  • Install TensorFlow following the instructions on this link. You do not need to install with GPU support.
  • The homework instructions assume a Python 2.7 based installation, but you may easily port them to Python 3, if you want.
  • (Optional) Install ipython and jupyter notebook, which provides a interactive python shell and web-based development environment. If you have followed the recommended virtualenv installation, the following commands should work (after enabling your tensorflow virtualenv): pip install --upgrade ipython, pip install --upgrade jupyter. If you want, you are allowed to submit your source code in the form of jupyter notebook files.

Step 2: Learn TensorFlow basics

  • Read and do the exercises in the page "Get started". Make sure you understand the computation graph, session, placeholder, variable and other fundamental TF concepts very well. (The section on tf.contrib.learn is optional.)
  • Read and do the exercises in the page "Deep MNIST for Experts".
  • Get familiar with TensorFlow documentation. To complete the assignment, you may need to learn additional details of TensorFlow yourself using TensorFlow API documentation and/or its source code.

Step 3: Build your super-resolution network

Your task is to design and train a deep super-resolution network that will accurately predict 28x28 MNIST digits from 7x7 input images.

Your basic super-resolution network should consist of a series of fully-connected neural networks with ReLU activations. You are required to explore the following details and techniques to improve your neural network:

  • tune the number of layers and the number of neurons in the hidden layers (20pts)
  • investigate the effect of Xavier vs Gaussian (random) initialization (10pts)
  • try using batch normalization layers (15 pts)
  • try using l2-regularization and dropout-regularization, tune the l2-regularization weight and dropout keep probability hyper-parameter (15 pts)
  • try adding convolutional layer(s) on the output of the fully connected layers (20 pts)
  • use Adam optimizer and tune its hyperparamaters (number of iterations, batch size, learning rate, etc). Note that early-stopping based on validation set error can sometimes be used as a regularization technique. (20 pts)
Investigate these techniques in the order that you prefer / need, such that you progressively build a better model by using combinations of these techniques. Note that you may need to re-visit some of these techniques after you update your architecture, as the behaviour of deep learning techniques and their hyper-parameters depends on overall network architecture.

Train your models on MNIST training set (without using the labels of the digits), and evaluate them on the validation set in terms of reconstruction error. In your report, explain all the techniques/architectures/hyper-parameters/etc that you try (together with your motivations), provide and discuss the evaluation results that you obtain for them. The source file(s) the you submit should also provide the main code for all these experiments that you do (no need to submit separate source files just for hyper-parameter variations).

Use the following code as a starting point in your implementation: cs559_homework_template.py Read the comments provided with the template carefully. The template already provides the necessary code for downloading and reading mnist dataset, producing the 7x7 input images from the original 28x28 images, building a simple random network as an example, and evaluation on the train/validation/test set.

Step 4: Evaluate your network on the test set

Once you are done with the design and development of your network architecture, report your reconstruction error on the test set for the model with the smallest validation set error.

Additionally, provide a couple of informative success and failure examples (ie. input-output image pairs) in your report. Discuss your results.

Grading

The goal, clearly, is not to randomly try a very large number of different architectures and hyper-parameters. Instead, you are expected to run a series of meaningful, purposeful experiments towards building a successful super resolution model, analyze and understand the results in a detailed way, and thoroughly discuss what works well/poorly (and why). Provide plots (eg, reconstruction error over training iterations), tables and qualitative examples to support your discussion, if needed.

The final accuracy on the test set will be taken into account in the following way: if your test set error is much higher than what most people achieves, you may loose points up to 25 points. If you obtain exceptionally good results (without cheating) on the test set, you may get extra points up to 15 points. The total grade of the homework may not exceed 100 points.

The quality of the report is important. You may loose points up to 40 points, due to poor presentation and/or insufficient discussion of the results in the report. Reports need not be very long, the number of pages is not a factor in grading.

Rules and FAQ

Homework can be done individually or in pairs. Larger groups are not allowed.

The source code that you submit should cover all results and experiments that you provide in your report in an accessible manner. That is, do not throw away the code for the models/approaches that you try, report but do not use at the end (exploration of various deep learning architectures and techniques is part of the homework). Also the README file should be informative enough so that I can locate the code sections corresponding to your results.

Make sure that you use vanilla TensorFlow in your implementation, not a wrapper over it (like tflearn or keras). You are allowed to use utility functions under tf.contrib (please ask if you are in doubt). You are allowed to use TensorFlow standard implementations of dropout and batch-normalization.

Can I re-use existing code? You may re-use the codes (with citation) from the tutorial pages whose links are explicitly provided in the homework specification. However, in most cases, you cannot reuse existing code from other sources, even if it is your own code, unless it is a small utility code that is not directly related to the homework contents (eg, zip file I/O). If you decide to use/adapt an existing code, beware of the following: (i) you should cite and explain anything that you are reusing/adapting clearly in your source code and your report, (ii) you may get partial/zero grade for those parts (even if you cite it properly), if the re-used code covers a part of the homework that needs to be implemented by you, (iii) you should not violate the license of the code that you are reusing. If you are in doubt, please check the honor code in the syllabus and/or contact me.

Do I need a computer with a GPU to complete the assignment? You do not need to use a GPU to complete the homework. Homework is designed such that the experiments can be completed in a bearable amount of time on a modern CPU. However, if you have access to a high-end GPU, you definitely can use it (please mention this in your report), which may significantly speed up your experiments. Also note that, as long as you use the standard operators / layers in tensorflow, your code is ready to run on a GPU, in most cases.

Will you teach TensorFlow in class? There will not be a separate in-class tutorial on TensorFlow due to lack of time. The homework itself aims to be the tutorial. If you encounter an issue, you may look for a solution through online resources like tensorflow documentation, tensorflow discussion groups, and stackoverflow. You may also discuss your problems in CS 559 moodle forum.