I'm really struggling with this exercise about linked lists and node removal.
Basically, DNA sequences are said to be complementary when each base in one sequence has the corresponding complement in the other sequence at the same position. Base A bonds with T, and base C bonds with G. For example, the sequences ACGT and TGCA are complementary.
I have to implement a function which takes two linked lists: the first is the original DNA and the second is the edit sequence. The function must apply the edit method described earlier and return the new list.
For instance, we've got "A C G T A G A C G T T C T A" as the original DNA sequence and "G C A" as the bonding sequence. G matches with C, C matches with G and A matches with T (and vice-versa). Therefore, "C G T" is "G C A"'s reflex. Then we have to remove "C", "G" and T", following this exact order, from the original DNA sequence, which turns out to be the expected result stated below.
Also, two things: 1. I must ignore any subsequent matches generated as a result of any node removals. 2. I cannot use any headers besides stdlib.h.
Input:
A C G T A G A C G T T C T A
G C A
Expected Result:
A A G A T C T A
Actual Result:
A A G A C G T T C T A
Code Tentative:
typedef struct LinkedNode LinkedNode;
struct LinkedNode {
char data;
LinkedNode *next;
};
int bases_dna(LinkedNode *atual, LinkedNode *seq_edicao){
int i = 0, k = 0, seq_count = 0;
LinkedNode *seq_new = seq_edicao;
while(seq_new!=NULL){
seq_new = seq_new->next;
seq_count ;
}
seq_new = seq_edicao;
for(; i<seq_count; i ){
if ((atual->data == 'C' && seq_new->data == 'G') ||
(atual->data == 'A' && seq_new->data == 'T') ||
(atual->data == 'G' && seq_new->data == 'C') ||
(atual->data == 'T' && seq_new->data == 'A')){
k ;
atual = atual->next;
seq_new = seq_new->next;
}
}
if(k==seq_count) return k;
else return 0;
}
LinkedNode *editar_dna(LinkedNode *dna_original, LinkedNode *seq_edicao){
if (dna_original == NULL) return NULL;
int result = bases_dna(dna_original, seq_edicao);
if (result != 0){
LinkedNode *temp = dna_original;
for(int i = 0; i < result; i ){
temp = temp->next;
}
return temp;
}
dna_original->next = editar_dna(dna_original->next, seq_edicao);
return dna_original;
}
Code Explanation:
The function named editar_dna does all the recursive job and calls for an auxiliary function named bases_dna, which is responsible to verify for the given edit sequence within the original DNA hence returning the number of elements this sequence has in order to let the first function know how many elements will have to be "skipped".
The issue that I'm facing is that my recursion execution stops as soon as it reaches the first return within editar_dna function. I was thinking of implementing its conditional structure right after the recursion, so it wouldn't stop too soon and would probably reach until the end of the list. However, that would be a good way to solve this exercise IF somehow I could easily verify chars from the end of a linked list until its beginning, but the opposite is much simpler (and that is exactly what my auxiliary function assumes that will happen).
Any ideas would be hugely appreciated.
Thanks!
CodePudding user response:
I should have written a code based on your current code for better communication but I thought it will be simpler not to use the recursion. Would you please try:
#include <stdio.h>
#include <stdlib.h>
typedef struct LinkedNode {
char data;
struct LinkedNode *next;
} LinkedNode;
/*
* append the node with data at the end of the list
*/
void append(LinkedNode *head, char data)
{
LinkedNode *newNode = malloc(sizeof(LinkedNode));
newNode->data = data;
newNode->next = NULL;
LinkedNode *temp = head;
while (temp->next != NULL) {
temp = temp->next;
}
temp->next = newNode;
}
/*
* dump the data of the list
*/
void dump(LinkedNode *head)
{
for (head = head->next; head != NULL; head = head->next) {
printf("%c ", head->data);
}
printf("\n");
}
/*
* generate another list of complimentary sequence
*/
void gencomp(LinkedNode *dest, LinkedNode *src)
{
char s_data, d_data;
for (src = src->next; src != NULL; src = src->next) {
s_data = src->data;
switch(s_data) {
case 'A':
d_data = 'T';
break;
case 'T':
d_data = 'A';
break;
case 'C':
d_data = 'G';
break;
case 'G':
d_data = 'C';
break;
default:
fprintf(stderr, "unknown base: %c\n", s_data);
exit(1);
}
append(dest, d_data);
}
}
/*
* return the pointer to the last element if the subsequences match
*/
LinkedNode *match(LinkedNode *head, LinkedNode *comp)
{
for (; head != NULL; head = head->next, comp = comp->next) {
if (head->data != comp->data) {
return NULL;
} else if (comp->next == NULL) {
return head;
}
}
return NULL;
}
/*
* search "head" for "comp". If matched, the matched portion is skipped
* (not freed)
*/
void edit(LinkedNode *head, LinkedNode *comp)
{
LinkedNode *matched;
for (; head != NULL; head = head->next) {
if ((matched = match(head->next, comp->next))) {
head->next = matched->next; // skip matched sequence
}
}
}
int main()
{
char dna_orig[] = "ACGTAGACGTTCTA";
char dna_bond[] = "GCA";
int i;
// generate list of original sequence
LinkedNode *head = malloc(sizeof(LinkedNode));
head->next = NULL;
for (i = 0; dna_orig[i] != 0; i ) {
append(head, dna_orig[i]);
}
// generate list of bonding sequence
LinkedNode *bond = malloc(sizeof(LinkedNode));
bond->next = NULL;
for (i = 0; dna_bond[i] != 0; i ) {
append(bond, dna_bond[i]);
}
// generate list of complementary sequence of bond
LinkedNode *comp = malloc(sizeof(LinkedNode));
comp->next = NULL;
gencomp(comp, bond);
// edit the original sequence and see the result
edit(head, comp);
dump(head);
return 0;
}
Output:
A A G A T C T A
- I have created a list of complementary sequence of the bonding sequence to make the searching easy.
- The code is inefficient so far because it uses repeated linear search in the list.
- I had to include <stdio.h> for the printing purpose.
CodePudding user response:
From the title and discussion I'm assuming that applying the edit should mutate the input list by removing matched subsequences. If the first node of the list is removed, you'll need to return a pointer to some later node. Otherwise the first node is returned.
The naive way to handle the two cases is with specific checks. But, there's a well-known pattern for handling both cases as one. It's pre-pending a "dummy" head node temporarily, which is never removed. Rather than try to explain further, I'll append code. See apply_edit
.
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include <string.h>
typedef struct node_s {
struct node_s *next;
char base;
} Node;
void print(Node *seq) {
for (Node *p = seq; p; p = p->next) printf("%c", p->base);
printf("\n");
}
bool base_match(char x, char y) {
switch (x) {
case 'A': return y == 'T';
case 'C': return y == 'G';
case 'G': return y == 'C';
case 'T': return y == 'A';
}
exit(42); // input error
}
// If the prefix of seq is as given, return the following seq node, else seq.
Node *remove_prefix(Node *prefix, Node *seq) {
Node *p, *s;
for (p = prefix, s = seq; p && s; p = p->next, s = s->next)
if (!base_match(p->base, s->base))
return seq;
return p ? seq : s;
}
Node *apply_edit(Node *edit, Node *seq) {
Node head[1] = {{ seq }};
for (Node *p = head; p->next; p = p->next)
p->next = remove_prefix(edit, p->next);
return head->next;
}
Node *build_seq(char *gene) {
Node *seq = NULL;
for (int i = strlen(gene) - 1; i >= 0; --i) {
Node *node = malloc(sizeof *node);
node->base = gene[i];
node->next = seq;
seq = node;
}
return seq;
}
int main(void) {
Node *seq = build_seq("ACGTAGACGTTCTA");
Node *edit = build_seq("GCA");
print(apply_edit(edit, seq));
return 0;
}