How To Get The Function Declaration Or Definitions Using Regex
Solution 1:
This is a convenient script I wrote for such tasks but it wont give the function types. It's only for function names and the argument list.
# Exctract routine signatures from a C++ moduleimport re
defloadtxt(filename):
"Load text file into a string. I let FILE exceptions to pass."
f = open(filename)
txt = ''.join(f.readlines())
f.close()
return txt
# regex group1, name group2, arguments group3
rproc = r"((?<=[\s:~])(\w+)\s*\(([\w\s,<>\[\].=&':/*]*?)\)\s*(const)?\s*(?={))"
code = loadtxt('your file name here')
cppwords = ['if', 'while', 'do', 'for', 'switch']
procs = [(i.group(2), i.group(3)) for i in re.finditer(rproc, code) \
if i.group(2) notin cppwords]
for i in procs: print i[0] + '(' + i[1] + ')'
Solution 2:
See if your C compiler has an option to output a file of just the prototypes of what it is compiling. For gcc, it's -aux-info FILENAME
Solution 3:
I think regex isn't best solution in your case. There are many traps like comments, text in string etc., but if your function prototypes share common style:
type fun_name(args);
then \w+ \w+\(.*\);
should work in most cases:
mn> egrep "\w+ \w+\(.*\);" *.h
md5.h:externboolmd5_hash(constvoid *buff, size_t len, char *hexsum);
md5file.h:intcheck_md5files(constchar *filewithsums, constchar *filemd5sum);
Solution 4:
I think this one should do the work:
r"^\s*[\w_][\w\d_]*\s*.*\s*[\w_][\w\d_]*\s*\(.*\)\s*$"
which will be expanded into:
stringbegin:
^
anynumberof whitespaces (including none):
\s*
returntype:
- start with letter or _:
[\w_]
- continuewithany letter, digit or _:
[\w\d_]*
anynumberofwhitespaces:
\s*
anynumberofany characters
(for allow pointers, arrays and so on,
could be replaced with more detailed checking):
.*
anynumberofwhitespaces:
\s*
functionname:
- start with letter or _:
[\w_]
- continuewithany letter, digit or _:
[\w\d_]*
anynumberofwhitespaces:
\s*
open argumentslist:
\(
arguments (allow none):
.*
close argumentslist:
\)
anynumberofwhitespaces:
\s*
stringend:
$
It's not totally correct for matching all possible combinations, but should work in more cases. If you want it to be more accurate, just let me know.
EDIT: Disclaimer - I'm quite new to both Python and Regex, so please be indulgent ;)
Solution 5:
There are LOTS of pitfalls trying to "parse" C code (or extract some information at least) with just regular expressions, I will definitely borrow a C for your favourite parser generator (say Bison or whatever alternative there is for Python, there are C grammar as examples everywhere) and add the actions in the corresponding rules.
Also, do not forget to run the C preprocessor on the file before parsing.
Post a Comment for "How To Get The Function Declaration Or Definitions Using Regex"