WireShark Filter Example
17 Nov 2011 No Comments
in Technology
1 | ip.addr == 172.29.96.30 and http and http.request.method == GET |
a small person with big dream
17 Nov 2011 No Comments
in Technology
1 | ip.addr == 172.29.96.30 and http and http.request.method == GET |
14 Oct 2011 No Comments
in Technology
To Invalidate /etc/hosts cache, aka, clear DNS cache.
1 | nscd -i hosts |
29 Sep 2011 No Comments
in Technology
1 | htmlParserCtxtPtr parser = htmlCreatePushParserCtxt(NULL, NULL, NULL, 0, NULL, 0); |
Then, you can set many options on that parser context.
2 | htmlCtxtUseOptions(parser, HTML_PARSE_NOBLANKS | HTML_PARSE_NOERROR | HTML_PARSE_NOWARNING | HTML_PARSE_NONET); |
We are now ready to parse an (X)HTML document.
3 4 5 6 | // char * data : buffer containing part of the web page // int len : number of bytes in data // Last argument is 0 if the web page isn’t complete, and 1 for the final call. htmlParseChunk(parser, data, len, 0); |
Once you’ve pushed it all your data, you can call that function again with a NULL buffer and ’1′ as the last argument. This will ensure that the parser have processed everything.
Finally, how to get the data you parsed? That’s easier than it seems. You simply have to walk the XML tree created.
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | void walkTree(xmlNode * a_node) { xmlNode *cur_node = NULL; xmlAttr *cur_attr = NULL; for (cur_node = a_node; cur_node; cur_node = cur_node->next) { // do something with that node information, like… printing the tag’s name and attributes printf(“Got tag : %s\n”, cur_node->name); for (cur_attr = cur_node->properties; cur_attr; cur_attr = cur_attr->next) { printf(“ -> with attribute : %s\n”, cur_attre->name); } walkTree(cur_node->children); } } walkTree(xmlDocGetRootElement(parser->myDoc)); |
And that’s it! Isn’t that simple enough? From there, you can do any kind of stuff, like finding all referenced images (by looking at “img” tag) and fetching them, or anything you can think of doing.
Also, you should know that you can walk the XML tree anytime, even if you haven’t parsed the whole (X)HTML document yet.
If you have to parse (X)HTML in C, you should use libxml2′s HTMLParser. It will save you a lot of time.
11 Sep 2011 No Comments
in Technology
1 2 3 4 5 | #!/bin/bash for ((i=1;i<=20;i++)) do wget -q -O - http://www.mitbbs.com/article_t1/Immigration/31933935_0_$i.html | grep -o '[[:alnum:]+\.\_\-][[:alnum:]+\.\_\-]*@[[:alnum:]+\.\_\-]*[[:alnum:]+]' | sort | uniq done |
05 Apr 2011 No Comments
in Technology
1 2 3 4 5 6 7 8 | sort -r +2 -3 infile +m Start at the first character of the m+1th field. -n End at the last character of the nth field (if -N omitted, assume the end of the line). -f Make all lines uppercase before sorting (so "Bill" and "bill" are treated the same). -r Sort in reverse order (so "Z" starts the list instead of "A"). -n Sort a column in numerical order -tx Use x as the field delimiter (replace x with a comma or other character). -u Suppress all but one line in each set of lines with equal sort fields (so if you sort on a field containing last names, only one "Smith" will appear even if there are several). |