CPSC 115L: Introduction to Computing Fall 2010

Homework 6

CPSC 115-01: due Monday, November 1
CPSC 115-02: due Tuesday, November 2

1. Substring matching

A DNA (or genetic) sequence is a string defined over the alphabet {A, C, G, T} (where A, C, G and T stand for the four nucleotide bases of a DNA, namely, adenosine, cytidine, guanosine and thymidine, respectively). In molecular biology, it is understood that such a sequence encodes instructions for building protein molecules that form your cells and, by extension, your body (for more on DNA sequencing and its importance, see, e.g., NHGRI's Educational Resources). The following two exercises concern substring matching: the problem of determining whether or not a short string is contained within a long string. This is a very important problem in DNA sequencing (in particular, in reconstructing unknown DNA strings from known pieces and in searching patterns in known DNA strings).

1.1. Python provides the string method find to determine the lowest start index of a given substring within a range of another string. For this exercise, without using the string method find, implement a function named my_find that returns the lowest start index of a given substring within a range of another string. Your function must start with the following header:
def my_find(seq, subseq, i, j):
where seq is a string, subseq is a potential substring, and i and j specify the indices of a range in seq. Your function is to return the lowest start index of subseq in seq[i:j] if subseq is a substring of seq[i:j] and -1 otherwise. Your function must behave in the following way:
>>> print my_find('ACAATGGGAGAGTCCATTCT', 'AT', 0, 20)
3
>>> print my_find('ACAATGGGAGAGTCCATTCT', 'AT', 10, 20)
15
>>> print my_find('ACAATGGGAGAGTCCATTCT', 'AT', 5, 15)
-1
Implement your function in a Python script named my_find.py. Run your script with five test cases (including the above three cases) and save the snapshots of your test runs in a text file named my_find.out.

1.2. Biologists often wish to find all occurrences of a substring, not simply the first one. By modifying your function my_find, implement a function named multi_find that returns a list of all start indices of a given substring that occurs within a range of another string. Your function must start with the following header:
def multi_find(seq, subseq, i, j):
where seq is a string, subseq is a potential substring, and i and j specify the indices of a range in seq. Your function is to return a list of all start indices of subseq in seq[i:j] if subseq is a substring of seq[i:j] and [] otherwise. Your function must behave in the following way:
>>> print multi_find('ACAATGGGAGAGTCCATTCTACAATGGGAGAGTCCATTCT', 'AT', 0, 40)
[3, 15, 23, 35]
>>> print multi_find('ACAATGGGAGAGTCCATTCTACAATGGGAGAGTCCATTCT', 'AT', 10, 20)
[15]
>>> print multi_find('ACAATGGGAGAGTCCATTCTACAATGGGAGAGTCCATTCT', 'AT', 5, 15)
[]
Implement your function in a Python script named multi_find.py. Run your script with five test cases (including the above three cases) and save the snapshots of your test runs in a text file named multi_find.out.

2. Caesar ciphers

The Caesar cipher is one of the oldest and most widely-known cryptographic techniques. It is named after Julius Caesar, who used it to communicate with his generals. The Caesar cipher encrypts a plaintext message by simply shifting every letter a fixed number of places forward in the alphabet. The most well-known form of the Caesar cipher shifts every letter three places; that is, ‘a’ is replaced by ‘D’, ‘b’ is replaced by ‘E’, and so on. Here is the complete mapping:
Plain alphabet: abcdefghijklmnopqrstuvwxyz
Cipher alphabet: DEFGHIJKLMNOPQRSTUVWXYZABC
For example, his famous message
veni vidi vici
(meaning, I came, I saw, I conquered) is encrypted as
YHQL YLGL YLFL
In cryptography, it is customary to use lowercase letters in plaintext and uppercase letters in ciphertext. Complete the following two exercises.

2.1. Design and implement a Python script to encrypt a plaintext message using the Caesar cipher described above. You may assume that no punctuation marks are used. Your script should behave as follows:
Enter a plaintext message: the quick brown fox jumps over the lazy dog
Your message is encrypted as: WKH TXLFN EURZQ IRA MXPSV RYHU WKH ODCB GRJ
Recall that each character is encoded by a unique ASCII code: The lowercase letters ‘a’, ‘b’, ..., ‘z’ are encoded by 97, 98, ..., 122, respectively. The uppercase letters ‘A’, ‘B’, ..., ‘Z’ are encoded by 65, 66, ..., 90, respectively. In Python, the function ord converts a letter to its ASCII code, and the function chr converts an ASCII code to its letter. For example,
>>> code = ord('a')
>>> print chr(code + 3 - 32)
D
As usual, first write down your I/O specification and algorithm. Then implement your algorithm in a Python script named encrypt.py. Run your script with five test cases and save the snapshots of your test runs in a text file named encrypt.out.

2.2. Now, design and implement a Python script to decrypt a ciphertext message encrypted by the Caesar cipher described above. You may assume that no punctuation marks are used. Your script should behave as follows:
Enter a ciphertext message: YHQL YLGL YLFL
Your message is decrypted as: veni vidi vici
As before, first write down your I/O specification and algorithm. Then implement your algorithm in a Python script named decrypt.py. Run your script with five test cases and save the snapshots of your test runs in a text file named decrypt.out.

What to hand in

Submit the following in paper. Be sure to put a file header at the top of each file.

Plagiarism and academic dishonesty

Please remember our course policy on plagiarism and academic dishonesty: You are encouraged to consult with one another when you work on homework assignments, but in the end everyone must do one's own work to hand in. In particular, discussion of homework assignments should be limited to brainstorming and verbally going through strategies, but it must not involve one student sharing written solutions with another student. In the end everyone must write up solutions independently. If you have discussed with classmates or used any outside source, you must clearly indicate so on your solutions and provide all references. Turning in another person's work under your name is plagiarism and qualifies as academic dishonesty. Academic dishonesty is a serious intellectual violation, and the consequences can be severe. For more details, read the Student Handbook 2010–2011, pp. 21–29.


* CPSC 115L home page
Valid HTML 4.01!