| Teammate 1 | Teammate 2 | Teammate 1 | Teammate 2 | |
|---|---|---|---|---|
| Kristen Anderson | Corazon Irizarry | Jake Elder | Greg Vaughan | |
| Jin Feng Liu | Ryan Ersland | Catherine Doyle | Jesse Vazquez | |
| Nick Dragu | John Wilsterman | Chelsea Bainbridge-Donner | Jeff Young |
| Amino Acid Name | Symbol | Amino Acid Name | Symbol |
|---|---|---|---|
| Alanine | A | Leucine | L |
| Arginine | R | Lysine | K |
| Asparagine | N | Methionine | M |
| Aspartic acid | D | Phenylalanine | F |
| Cysteine | C | Proline | P |
| Glutamic acid | E | Serine | S |
| Glutamine | Q | Threonine | T |
| Glycine | G | Tryptophan | W |
| Histidine | H | Tyrosine | Y |
| Isoleucine | I | Valine | V |
ADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQDMI NEVDADGNGTIDFPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGYISAAEL RHVMTNLGEKLTDEEVDEMIREADIDGDGQVNYEEFVQMMTAKScientists are interested in how proteins are similar because similar proteins may have similar functions. One simple way to measure the similarity of two proteins is by determining the number of the same amino acids that the two proteins have in the same location in their sequences. Consider the snippets of two protein sequences shown below:
Protein 1: AEKEAFSQVNEEF Protein 2: QVKNAYMGEGEEPFMProtein 1 and 2 share four amino acides in common (K, A, E, E). In addition, the common amino acids are in three subsequences (K, A, EE). The two proteins share four out of a maximum of fifteen possible amino acids. Therefore 4/15 = 26.6666666666667% similiarity.
You must write a program to determine the similarity between protein sequences. The protein sequences are to be input from a file. The input file must contain a series of contiguous strings (i.e., strings with no blanks in them). The first sequenece in the file is to be read in as the base protein. All other proteins in the file are to be compared to the base protein. Your program must:
EIREAFREEFVGTITTEIR GDLLFSGNPTIKKEFSQLTIFSLQIAE SLREAFREEFVGPNNMI EMREAFLEEFQGTITLEIF EIREAFREEFVGTITTEIRThe corresponding example output file looks like:
Base Protein: EIREAFREEFVGTITTEIR ******************** Next Protein ***************** GDLLFSGNPTIKKEFSQLTIFSLQIAE Shortest identical sequence: Length of shortest identical sequence: 0 Longest identical sequence: Length of longest identical sequence: 0 Number of matching sequences: 0 Number of matching amino acids: 0 Percentage match between the two strings: %0.0 ******************** Next Protein ***************** SLREAFREEFVGPNNMI Shortest identical sequence: REAFREEFVG Length of shortest identical sequence: 10 Longest identical sequence: REAFREEFVG Length of longest identical sequence: 10 Number of matching sequences: 1 Number of matching amino acids: 10 Percentage match between the two strings: %52.63157894736842 ******************** Next Protein ***************** EMREAFLEEFQGTITLEIF Shortest identical sequence: E Length of shortest identical sequence: 1 Longest identical sequence: REAF Length of longest identical sequence: 4 Number of matching sequences: 5 Number of matching amino acids: 14 Percentage match between the two strings: %73.68421052631578 ******************** Next Protein ***************** EIREAFREEFVGTITTEIR Shortest identical sequence: Length of shortest identical sequence: 0 Longest identical sequence: EIREAFREEFVGTITTEIR Length of longest identical sequence: 19 Number of matching sequences: 1 Number of matching amino acids: 19 Percentage match between the two strings: %100.0Another example input file is provided for you to test your program.
You must abide by the following: