AoC: Day 07

The Problem

Given a set of ‘rules’, figure out which bags could contain a given bag, and in part 2, how many bags could be contained in the given bag.

My Solution

In the prior year, the difficulty level ratcheted up fairly quickly, and I was beginning to get worried this year! Not to worry, this one was pretty hairy!

The rules specified very clearly describe a graph, and we need to essentially do a backward traversal for part 1 and a forward traversal (accumulating values as we go) for part 2. I’m happy to say I resisted the lure of actually creating a graph, and instead just used dictionaries instead of nodes and edges.

The format of the input was fairly intricrate, so I just parsed each line directly, rather than putting together a regular expression to do so. Different dictionaries for each part and recursion to wrap it all up!

Implementation

  • There is actually a library called parse that makes it pretty easy to, well, parse input like this
  • Most solutions just split the input line into two, and then searched for contained bags. I like my approach better
  • Some solutions used either dictionaries or lists but iterated over them multiple times. Very inefficient!
  • A few actually used networkx to build a graph and then process it
  • Most solutions were essentially unreadable (as I’m sure mine is). This is a function of the problem that we’re trying to solve, I suppose. Or maybe I’m just tired…
  • There was one succint, readable solution that used set comprehensions and a pretty neat recursion within it

Things are beginning to get exciting!

Follow by Email
Facebook
Twitter
LinkedIn

AoC: Day 06

The Problem

We have to combine all the single-character responses from groups of people in part 1, and count only the unique responses in part 2.

My Solution

Thankfully, a bit of sanity. I recognized both these as sets so just used the union and intersection operators. Quite happy with my input parsing as well — split into groups based on \n\n

One problem that I faced was that the last group from part 2 was not being processed properly. The last line was empty, hence its intersection with the rest of the data was zero. Learnt the difference between split("\n") and split()

Implementation

  • A few solutions did what I would have blindly done — manually count unique solutions. Some also combined this with sets!
  • We can use a single function and pass it set.reduce and set.union. cool! This function can apply the arguments using reduce, or, even cooler, using the * operator
  • Once we have the unpacking operator, a one-liner with a combination of set and list comprehensions is possible. Interesting, but ugly code, and defeats the purpose….

Overall interesting problem, but not too challenging. I learnt a few new things about python, but no new algorithmic ideas.

Follow by Email
Facebook
Twitter
LinkedIn

AoC: Day 05

The Problem

We need to convert a string that encodes the seat row and column to a seat number. In part 2, we have to find the missing seat number in the input data

My Solution

I came to AoC after a bit of a break and just wanted to get it done in time. The approach I took followed the illustration in the problem for the conversion (excuses, excuses, excuses…). So I actually walked through the string, and if the character is an ‘F’ I did this, and if it’s a ‘B’ I did that. Yes, I’m embarrassed (see below). To add to my embarrassment, I implemented two functions for recovering the row and column numbers, instead of just one. My Part 2 was the most idiotic, brute-force, unelegant solution you can imagine!

Implementation

  • This is not something I would do, but highlighting out of interest. A few solutions actually created a list of 128 (or 8, in case of columns) populated with the indices, and chopped it up.
  • Python string has a replace function and a maketrans function so it’s easy to replace, say, ‘F’ with a ‘0’ and ‘B’ with a ‘1’…. and then interpret this as an integer base 2! Why would you want to do this? see the note below.
  • A neat implementation uses recursion and what is essentially a binary search on the indices but keyed on the string

Algorithm

  • My ‘doh!’ moment was not recognizing that the string in its entirety was a binary representation of the desired output! Including the combination of rows and columns!!

Part 2 had multiple approaches:

  • Put all the values in a set and do a set difference with the entire range
  • Add all the numbers up and subtract from the arithmetic sum
  • Similar to the above, but use XOR on the full range
  • Simulate: Choose a seat and check if it’s occupied. If yes, choose the next seat. The expected number of times to do this is the harmonic series and converges very rapidly.

I really like the last approach, and would not have thought of it. I should have recognized the binary encoding and the summing, so am kicking myself.

Oh well, I live and learn!

Follow by Email
Facebook
Twitter
LinkedIn

AoC: Day 04

The Problem

We’re given a bunch of data in key:value pairs that we have to count (part 1) and validate (part 2). If it walks like a dict and talks like a dict…

My Solution

I was consciously looking for opportunities to do a bit of comprehension, so figured out how to do just that. The first had a bit of ugly code for the checks, I implemented the checks for the second part in a big function. Didn’t take me much time and my code ran the first time around, so I was quite happy.

Unfortunately, for the input processing as well as the checks, I directly did what the problem asked, while there were better and more elegant ways of solving the problem…

Implementation

  • Python has a function all that checks if all the entries of a list are true. Cool!
  • Processing the input — I broke the entries when I encountered an empty line. Much better to just split on \n\n, then join what remains and split on spaces, which gives us all the k:v pairs. Lesson: take a step back and think about the task at hand; the obvious solution isn’t the only one!
  • Comprehensions: the above can be put into a list comprehension, followed by a call to all[[y.split(':') for y in re.split(' |\n',x)] for x in r.split('\n\n')]
  • The biggest learning from this problem: we can use functions as values in a dict (and therefore lambdas as well)! I had no idea this was present, but in retrospect, obvious, as everything in python is an object
Follow by Email
Facebook
Twitter
LinkedIn

AoC: Day 03

The Problem

We’re still in the relatively easy phase. In this problem, we’re given a map with trees marked in, and on a straight-line path, we have to count how many trees we encounter. The only thing to account for is that the map repeats horizontally, we have to replicate it as many times as needed till we reach the bottom. We do the same in part 2, for different lines.

My Solution

I thought this was pretty straightforward; it took me about ten minutes for each part. However, when I looked at other solutions…

Algorithm

Not really an algorithm, but I was kicking myself about this. I physically (is that the right term?) replicated the map as many times as needed (even did a calculation to figure this out upfront). Turns out all I needed to do is use the mod operator!

Implementation

  • The last input in part 2 required going down two rows at a time. I wrapped myself up in knots doing this, but all that was needed was a [::2]
  • Lots of opportunities for (list) comprehensions, I need to start recognizing these.
  • Check this out: trees = sum(forest[i * sr][(i * sc) % cols] == '#' for i in range(rows // sr))
Follow by Email
Facebook
Twitter
LinkedIn

AoC: Day 02

The Problem

This was more of a string matching exercise followed by a bit of arithmetic on the extracted data. Part 2 involves a bit of logic.

My Solution

The first thing that came to mind was regular expressions with named fields. Its been a while, so I had to go back to the (excellent) docs, but this was pretty straightforward.

Turns out this was overkill (see below)! Also, I found out not one but two improvements in my if statements!

Other Approaches

There isn’t much to explore in terms of algorithm, since the task isn’t very complicated. So mostly looking at what I could learn from doing differently the

Implementation

  • My traditional C based thinking translated into python: if(num >= min_inst and num <= max_inst): A more pythonic way of doing the same: if(min_inst <= num <= max_inst):
  • I blindly translated the requirements of part 2 into two if statements, since we needed to satisfy either of two conditions, but not both …but that’s just what an xor does!
  • Another approach that I liked for part 2 was along the lines of if(condition_1 != condition_2):
Follow by Email
Facebook
Twitter
LinkedIn

AoC: Day 01

The Problem

An extremely simple problem to kick things off, and one that shows up on many beginners programming tests: find two numbers from a list that add up to a specified number. Part 2 asks for three numbers that add up to the specified number. The list is not sorted.

My Solution

My solution was pretty obvious: sort the list, and do a binary search. However, turns out that NumPy has a pretty nice function called searchsorted. We give it the sorted list and a list of keys to search for, and it returns a list of indices, each corresponding to where the key would be inserted while maintaining the order. So pretty cool, I learnt about a new function. And my overall runtime is $O(n\lg{n})$

For Part 2, the natural approach is to repeat part 1 for every element of the list, while suitably modifying the desired sum.

Other Approaches

Now the fun part! Let’s order these into three groups

Algorithm

  • The straightforward – a doubly nested loop,  $O(n^2)$
  • Many binary searches; see the next section for variations in the implementation
  • The one I liked best, and one that I’m kicking myself for not thinking about. Set a lo and hi index and add the corresponding values. If the sum is greater than the desired value, decrement hi, if less, increment lo, if equal, we’re done. The proof of why this is correct is left to the reader. Note that this only speeds up the search time to linear from  $O(n\lg{n})$, the sorting time is not changed. The asymptotic time for both approaches is the same, but this is more elegant!
  • Generating functions (see below)

Implementation

  • The straightforward approach can be expressed as a list comprehension
  • Use recursion to make it generic: if we want  $n$ numbers that add up to the desired value, express this in terms of $n-1$. And of course, don’t forget to terminate the recursion
  • Read the list of numbers into a set. Offload the search for the second number to the set. This can be done using a list as well, but the search time will be linear instead of logarithmic. Another approach used two dicts, for each input number and its complement. I think this was overkill, though I liked the approach
  • itertools.combinations I need to get a better handle on the itertools library. Turns out we can simply create all 2-combinations and a loop over these is all that is required. I haven’t looked up the time complexity, but this is likely  $O(n^2)$, so there is that…
  • itertools.product also can be used for generating the cartesian products of the input

Yowza!!

Express the input list as a polynomial. The values of the list are coefficients and the powers of the variable. Then in the square of this polynomial, the coefficient of the target numbered term is the product that we’re looking for since powers are added and coefficients are multiplied. Knuth would approve!! And of course, we can do the polynomial multiplication using FFTs.

Follow by Email
Facebook
Twitter
LinkedIn

Advent of Code

I recommend Advent of Code to everyone I meet. It is a wonderful way to develop your programming skills, since you’re completely on your own (other sites babysit you quite a bit, methinks). However, while doing the implementation is a great exercise, it is much more useful (especially if learning a new language) to look at what others have done (after solving it ourselves!).

I did about half of AoC in 2019, and figured I’d work on the 2020 edition when I had the time. Well, here we go!

Follow by Email
Facebook
Twitter
LinkedIn

Fractals

It is surprisingly easy to generate a fractal image. A schematic of the Mandelbrot Set is as follows. Start with the equation $z_{n+1} = z_n^2 + c$, where $c$ is a complex number. Your “image” is a 2d array whose rows and columns correspond to the real and imaginary parts of $c$. Set $z_0 = 0$ and repeatedly apply the equation. If the value converges in a set number of iterations, set the corresponding entry in your array to 1, else 0.

Input

Array/figure size in pixels

Output

Bitmapped image, running time

Variants

Color the pixel by the number of iterations it took to converge. Try variants of the equation (see Julia sets). And note that the computation of each pixel is independent of every other: we can parallelize this! We visit this in another post. There are many ways to optimize the sequential implementation — try these out!

Report

Show us what you learnt!

Follow by Email
Facebook
Twitter
LinkedIn

Searching

Input

Array size, data type, search algorithm, key. The key can specified as an index or as a non-existent value

Output

index, if key is in the array, -1 otherwise. Running time

Program

  1. Generate the array of the given data type and size. Populate with random numbers in sorted order (duh!)
  2. Implement the usual suspects:
    1. Linear search
    2. Binary search
    3. Exponential search
  3. Implement different optimizations for each of these
  4. Test that the search is correct

Run 

your implementation. There can be many variations in here, based on where the key is. Compare the performance of different searches for keys at the beginning, middle and end of the array. What do you think is the best-, average- and worst-case situation for each search? Do the performance results match your intuition? There are different ways of analyzing the performance, such as time v. array size for different key locations, or time v. key location for different array sizes, or …. What do you choose?

Report

your findings! Can you correlate these to the theoretical analysis and the architecture of your platform?

Follow by Email
Facebook
Twitter
LinkedIn

Sorting

This is so fundamental that you’re probably wondering why I even bother to put it here. Read on…

Input

Array size, data type, algorithm, sort order

Output

Running time of different parts of the code

Program

  1. Create an array of the given data type and size, and populate it with random values
  2. Implement multiple sorting routines:
    1. InsertionSort
    2. MergeSort
    3. QuickSort
    4. ShellSort
    5. CountingSort
  3. Implement different optimizations for these
  4. Test that the final array is sorted

Run

your implementation with different array sizes. Your program should report the time to generate the array, sort it and test that it has been sorted correctly.

Write a wrapper script that captures the above running times and generates a plot that shows the comparative performance of different sorting routines: the x-axis is the array size and the y-axis is the time. You should have multiple versions of the above sorts with different implementations.

What do you expect the graph to look like? Can you explain the shape of what you actually get? 

Should you combine different sorting approaches? What would you expect, and how would you do this?

What factors impact the performance that you observe?

Report

Compile your experiments and describe your findings. What conclusions do you draw from this? What do you hypothesize, and how did you test your hypotheses?

 

Follow by Email
Facebook
Twitter
LinkedIn

Command-Line and Options

Your programs should be parameterized and should take inputs at the command line (not prompt the user for inputs!)

The command line can be used to control the behavior of the program:

  • Specify the sort order
  • Specify the algorithm
  • Provide debug/progress information
  • Specify output formats and destination

To specify inputs:

  • From a file v. generate internally
  • Provide the search key if searching

This helps when you want to benchmark or generate performance data – easy to put into a script and push the button.

Figure out once how to process command-line options, and you can just use this for all your programs. Most languages have existing libraries or packages that do pretty sophisticated parsing and fancy stuff like how you specify flags and options. Experiment with these!

Follow by Email
Facebook
Twitter
LinkedIn

Focus on Performance

Colleges don’t talk about program performance; I find this strange since they focus so much on student performance!

Develop the habit of instrumenting your code by default — figure out the correct timer to use for your platform, and measure, measure, measure. Understand the interplay between the code you write, what the compiler does to it, and how it runs on the CPU. 

Investigate the tools available on your platform, and experiment with these. You will learn a lot from this than you can ever imagine!

Follow by Email
Facebook
Twitter
LinkedIn

Debug & Test

Learn how to debug your code. This is not just about the tools (research and get familiar with the appropriate tools for your language and platform) but also about writing code that is easy to debug and your mental model for how you approach debugging. 

Think about how you can test the code that you’ve written — both at the micro-level and for the entire application. Get into the habit of writing your own test cases, or rather, generating your test cases. Can you be comprehensive? (why not?) What are the corner cases? Each time you make a change / add a feature / fix a bug, ensure that your code passes all the tests you have defined. 

Learn how to talk about your implementation: the choices you made for the data structures and algorithms and why you chose this instead of that. And remember:

Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.

Brian Kernighan

Follow by Email
Facebook
Twitter
LinkedIn

Version Control

git is the flavor of the day; so learn how to git. Force yourself to use the command line, this will help you understand the philosophy of version control in general, and git in particular.

Create a github account and start using it. Use it to showcase your improving maturity as a programmer and developer, and use it to showcase your projects. 

Pro Git, Chacon and Straub is a fantastic tutorial and reference, and is available online

Follow by Email
Facebook
Twitter
LinkedIn

Editors

Learn how to use a good editor.

I’m not going to get into a religious war here, so will not make any recommendations. However, once you’re touch-typing, you should be making changes to your code in terms of your thoughts, not in terms of your editor’s commands. Choose an editor that makes sense, and choose one that is built for editing code. You will see your efficiency sky-rocket. 

Follow by Email
Facebook
Twitter
LinkedIn

Touch Typing

The one, simple thing you can do to increase your productivity more than 10-fold is to learn how to touch type. 

This isn’t just about improving your performance at the keyboard. It has to do with removing distractions and obstacles when coding. It’s about becoming one with the machine, translating the thoughts in your head into code, without having to hunt around for the right key and pressing it one at a time. Your brain works much faster than you can express your thoughts, and anything you can do to bridge that gap improves your performance. 

And this has a multiplier effect as you learn how to use a good editor.

 

Follow by Email
Facebook
Twitter
LinkedIn

The Developers Bookshelf

Algorithms & Data Structures

I’ve found multiple perspectives deepens my understanding of topics. The following books cover roughly the same ground, but have very different approaches, and I’ve gotten different insights from each of them

  • Introduction to Algorithms, Cormen, Leiserson, Rivest and Stein. The basics, written clearly and yet with rigor, this book should be the one to start with on most topics. Read it for self-study or as a reference.
  • Algorithm Design, Kleinberg and Tardos. A much more detailed look than CLRS, and the systematic exposition is a joy to read. There are a crazy number of exercises as well!
  • Algorithms, Dasgupta, Papadimitriou and Vazirani. Much less detailed than the first two, but a very different approach to many topics. 
  • Algorithms, 4th Ed., Sedgewick and Wayne. Many fewer topics covered, but in a lot more depth. Especially see how to analyze and test implementations. Great exercises and see the associated booksite as well.

And of course, no list of Algorithms books can be complete without

  • The Art of Computer Programming, vols. 1-n, Knuth. This is advanced, and not for the faint of heart. Also, the best categorization of sorting algorithms that I’ve seen. And demonstrates the limits of analysis (by which I mean it demonstrates how much analysis you can do!). 

Programming Praxis

More than syntax, programming is about the mindset. Here are a few books that help in this area:

  • Programming Pearls, Bentley. Yes, it’s old. Yes, it’s worth every bit.
  • Beautiful Code, Oram & Wilson. Bringing art into engineering
  • The Pragmatic Programmer: From Journeyman to Master, Hunt & Thomas

Others

  • Hackers and Painters: Big Ideas from the Computer Age, Paul Graham
  • Turing’s Cathedral: The Origins of the Digital Universe, George Dyson. A bit of history. 
  • Anything and everything by Simon Singh
  • “Surely You’re Joking, Mr. Feynman”, Richard Feynman

Language-Specific Books

C++

  • The C++ Programming Language, Stroustrup
  • Accelerated C++, Koenig & Moo
  • Modern C++ Design, Alexandrescu

Python

There are too many (very) good options, but I’m going to stick with just one:

  • Python Data Science Handbook, Jake VanderPlas

Design

  •  Design Patterns: Elements of Reusable Object-Oriented Software, Gamma, Helm, Johnson, Vlissides
  • Head First Design Patterns, Freeman & Robson

Fiction

  • The Hitchhiker’s Guide to the Galaxy, Douglas Adams. Everyone should know the answer to the ultimate question
  • The Discworld novels, Terry Pratchett
Follow by Email
Facebook
Twitter
LinkedIn

Programming for the Programmer

Motivation

Students aspiring to program come in all varieties. Some are good, some are not so good and this categorization is strictly within the same pool.

The frustrating part (for me) is that the good programmers have a long way to go to become competent developers. And the not-so-good programmers are not so good for no fault of their own. I find that the assignments that they do, either in college or elsewhere are unambitious and are too focused on specific concepts.

Many students seem to think sites such as leetcode are all that is required. I beg to differ – these are fine for preparing for interviews, but not for  becoming good developers.

Here’s a series of assignments that will take you beyond your current capabilities will teach you more about algorithms and data structures than you currently know, help you learn how to use the right tools, techniques, and skills, and get you started on building your portfolio of work.

The first group doesn’t even ask you to write any code:

The next group is focused on the basics:

  • Sort
  • Search
  • Fractals
  • Command line / options

And a few complicated ones:

  • Shell
  • Bellman-Ford
  • Matrix multiplication
  • Random walks

And the skills needed for tomorrow today:

  • Shared-memory multiprocessing
  • Distributed-memory multiprocessing
  • Map-reduce
Follow by Email
Facebook
Twitter
LinkedIn

Quantization

What you will learn:

  • Software: practice with loops and operations on arrays
  • Domain: approximations and error measurements, sampling

Input: Construct a (continuous) signal of your choice – say a simple sine
wave with a particular frequency and amplitude. Plot this. Note that since you
are representing the continous signal on a computer, so you are already
sampling it.

Discretization in space:

  • For the amplitude, choose a $n$ and divide the range into $2^n$ steps. For each index of your signal, (i) round up (ii) round down and (iii) round to the nearest step value.
  • Calculate the mean squared error due to this process.
  • Repeat for different values of $n$
  • Plot the error for different values of $n$. Does this match what you think it should look like?

Discretization in time:

  • What happens if you take fewer / more samples in time for representing your signal? How would you now determine the quality of your representation?
  • What is the lowest you can go? How does this relate to the frequency of the signal? (*cough* nyquist *cough*)

Discretization in both dimensions

  • Try it out!
Follow by Email
Facebook
Twitter
LinkedIn
Search

  • Search

  • Categories

  • Post Date