Skip to content Skip to sidebar Skip to footer

How To Get The Function Declaration Or Definitions Using Regex

I want to get only function prototypes like int my_func(char, int, float) void my_func1(void) my_func2() from C files using regex and python. Here is my regex format: '.*\(.*|[\r\

Solution 1:

This is a convenient script I wrote for such tasks but it wont give the function types. It's only for function names and the argument list.

# Exctract routine signatures from a C++ moduleimport re

defloadtxt(filename):
    "Load text file into a string. I let FILE exceptions to pass."
    f = open(filename)
    txt = ''.join(f.readlines())
    f.close()
    return txt

# regex group1, name group2, arguments group3
rproc = r"((?<=[\s:~])(\w+)\s*\(([\w\s,<>\[\].=&':/*]*?)\)\s*(const)?\s*(?={))"
code = loadtxt('your file name here')
cppwords = ['if', 'while', 'do', 'for', 'switch']
procs = [(i.group(2), i.group(3)) for i in re.finditer(rproc, code) \
 if i.group(2) notin cppwords]

for i in procs: print i[0] + '(' + i[1] + ')'

Solution 2:

See if your C compiler has an option to output a file of just the prototypes of what it is compiling. For gcc, it's -aux-info FILENAME

Solution 3:

I think regex isn't best solution in your case. There are many traps like comments, text in string etc., but if your function prototypes share common style:

type fun_name(args);

then \w+ \w+\(.*\); should work in most cases:

mn> egrep "\w+ \w+\(.*\);" *.h
md5.h:externboolmd5_hash(constvoid *buff, size_t len, char *hexsum);
md5file.h:intcheck_md5files(constchar *filewithsums, constchar *filemd5sum);

Solution 4:

I think this one should do the work:

r"^\s*[\w_][\w\d_]*\s*.*\s*[\w_][\w\d_]*\s*\(.*\)\s*$"

which will be expanded into:

stringbegin:   
        ^
anynumberof whitespaces (including none):
        \s*
returntype:
  - start with letter or _:
        [\w_]
  - continuewithany letter, digit or _:
        [\w\d_]*
anynumberofwhitespaces:
        \s*
anynumberofany characters 
  (for allow pointers, arrays and so on,
  could be replaced with more detailed checking):
        .*
anynumberofwhitespaces:
        \s*
functionname:
  - start with letter or _:
        [\w_]
  - continuewithany letter, digit or _:
        [\w\d_]*
anynumberofwhitespaces:
        \s*
open argumentslist:
        \(
arguments (allow none):
        .*
close argumentslist:
        \)
anynumberofwhitespaces:
        \s*
stringend:
        $

It's not totally correct for matching all possible combinations, but should work in more cases. If you want it to be more accurate, just let me know.

EDIT: Disclaimer - I'm quite new to both Python and Regex, so please be indulgent ;)

Solution 5:

There are LOTS of pitfalls trying to "parse" C code (or extract some information at least) with just regular expressions, I will definitely borrow a C for your favourite parser generator (say Bison or whatever alternative there is for Python, there are C grammar as examples everywhere) and add the actions in the corresponding rules.

Also, do not forget to run the C preprocessor on the file before parsing.

Post a Comment for "How To Get The Function Declaration Or Definitions Using Regex"