|
CPSC 115L: Introduction to Computing
|
Fall 2010
|
Homework 6
CPSC 115-01: due Monday, November 1
CPSC 115-02: due Tuesday, November 2
1. Substring matching
A DNA (or genetic) sequence is a string defined over the
alphabet {A, C, G, T} (where A, C,
G and T stand for the four nucleotide bases of a DNA, namely,
adenosine, cytidine, guanosine and thymidine,
respectively). In molecular biology, it is understood that such a sequence
encodes instructions for building protein molecules that form your cells and,
by extension, your body (for more on DNA sequencing and its importance, see,
e.g., NHGRI's Educational
Resources). The following two exercises concern
substring matching: the problem of determining whether or not a short
string is contained within a long string. This is a very important problem in
DNA sequencing (in particular, in reconstructing unknown DNA strings from
known pieces and in searching patterns in known DNA strings).
1.1.
Python provides the string method find to determine the lowest start
index of a given substring within a range of another string. For this
exercise, without using the string method find, implement a
function named my_find that returns the lowest start index of a given
substring within a range of another string. Your function must start with
the following header:
def my_find(seq, subseq, i, j):
where seq is a string, subseq is a potential substring, and
i and j specify the indices of a range in seq.
Your function is to return the lowest start index of subseq in
seq[i:j] if subseq is a substring of seq[i:j] and
-1 otherwise. Your function must behave in the following way:
>>> print my_find('ACAATGGGAGAGTCCATTCT', 'AT', 0, 20)
3
>>> print my_find('ACAATGGGAGAGTCCATTCT', 'AT', 10, 20)
15
>>> print my_find('ACAATGGGAGAGTCCATTCT', 'AT', 5, 15)
-1
Implement your function in a Python script named my_find.py. Run
your script with five test cases (including the above three cases) and save
the snapshots of your test runs in a text file named my_find.out.
1.2.
Biologists often wish to find all occurrences of a substring, not
simply the first one. By modifying your function my_find, implement
a function named multi_find that returns a list of all start
indices of a given substring that occurs within a range of another string.
Your function must start with the following header:
def multi_find(seq, subseq, i, j):
where seq is a string, subseq is a potential substring, and
i and j specify the indices of a range in seq.
Your function is to return a list of all start indices of subseq in
seq[i:j] if subseq is a substring of seq[i:j] and
[] otherwise. Your function must behave in the following way:
>>> print multi_find('ACAATGGGAGAGTCCATTCTACAATGGGAGAGTCCATTCT', 'AT', 0, 40)
[3, 15, 23, 35]
>>> print multi_find('ACAATGGGAGAGTCCATTCTACAATGGGAGAGTCCATTCT', 'AT', 10, 20)
[15]
>>> print multi_find('ACAATGGGAGAGTCCATTCTACAATGGGAGAGTCCATTCT', 'AT', 5, 15)
[]
Implement your function in a Python script named multi_find.py. Run
your script with five test cases (including the above three cases) and save
the snapshots of your test runs in a text file named multi_find.out.
2. Caesar ciphers
The Caesar cipher is one of the oldest and most widely-known cryptographic
techniques. It is named after Julius Caesar, who used it to communicate with
his generals. The Caesar cipher encrypts a plaintext message by simply
shifting every letter a fixed number of places forward in the alphabet. The
most well-known form of the Caesar cipher shifts every letter three places;
that is, ‘a’ is replaced by ‘D’, ‘b’ is
replaced by ‘E’, and so on. Here is the complete mapping:
| Plain alphabet: |
abcdefghijklmnopqrstuvwxyz |
| Cipher alphabet: |
DEFGHIJKLMNOPQRSTUVWXYZABC |
For example, his famous message
veni vidi vici
(meaning, I came, I saw, I conquered) is encrypted as
YHQL YLGL YLFL
In cryptography, it is customary to use lowercase letters in plaintext and
uppercase letters in ciphertext. Complete the following two exercises.
2.1.
Design and implement a Python script to encrypt a plaintext message using the
Caesar cipher described above. You may assume that no punctuation marks are
used. Your script should behave as follows:
Enter a plaintext message: the quick brown fox jumps over the lazy dog
Your message is encrypted as: WKH TXLFN EURZQ IRA MXPSV RYHU WKH ODCB GRJ
Recall that each character is encoded by a unique ASCII code: The lowercase
letters ‘a’, ‘b’, ..., ‘z’ are encoded by
97, 98, ..., 122, respectively. The uppercase letters
‘A’, ‘B’, ..., ‘Z’ are encoded by
65, 66, ..., 90, respectively. In Python, the function ord converts
a letter to its ASCII code, and the function chr converts an ASCII
code to its letter. For example,
>>> code = ord('a')
>>> print chr(code + 3 - 32)
D
As usual, first write down your I/O specification and algorithm. Then
implement your algorithm in a Python script named encrypt.py. Run
your script with five test cases and save the snapshots of your test runs
in a text file named encrypt.out.
2.2.
Now, design and implement a Python script to decrypt a ciphertext message
encrypted by the Caesar cipher described above. You may assume that no
punctuation marks are used. Your script should behave as follows:
Enter a ciphertext message: YHQL YLGL YLFL
Your message is decrypted as: veni vidi vici
As before, first write down your I/O specification and algorithm. Then
implement your algorithm in a Python script named decrypt.py. Run
your script with five test cases and save the snapshots of your test runs in
a text file named decrypt.out.
What to hand in
Submit the following in paper. Be sure to put a file header at the top of
each file.
- For Part 1, printouts of the Python scripts my_find.py,
multi_find.py and the output files my_find.out,
multi_find.out.
- For Part 2, your I/O specifications and algorithms for 2.1 and 2.2, and
printouts of the Python scripts encrypt.py, decrypt.py
and the output files encrypt.out, decrypt.out.
Plagiarism and academic dishonesty
Please remember our course policy on plagiarism and academic dishonesty:
You are encouraged to consult with one another when you work on homework
assignments, but in the end everyone must do one's own work to hand in. In
particular, discussion of homework assignments should be limited to
brainstorming and verbally going through strategies, but it must not involve
one student sharing written solutions with another student. In the end
everyone must write up solutions independently. If you have discussed
with classmates or used any outside source, you must clearly indicate so on
your solutions and provide all references. Turning in another person's work
under your name is plagiarism and qualifies as academic dishonesty. Academic
dishonesty is a serious intellectual violation, and the consequences can be
severe. For more details, read the
Student Handbook 2010–2011, pp. 21–29.