Skip to content Skip to sidebar Skip to footer

Get Xpath From Search Result Of A Specific Regex Pattern In A Bunch Of Xml Files

I have many XML files, and i have to search in these files a string (in detail that will be a not-too-complicated regex). With the results i want to get the xpath of the node in wh

Solution 1:

Search:

 //*[contains('home') or contains('house')]

In PHP:

Use DOMDocument & DOMXPath, and then just call DOMNode::getNodePath() on the resulting matches.

If you actually need a regex instead of those matches earlier, php's DOMDocument only has XPATH 1.0 functions, but you can add functionality to DOMXPath by adding a user defined function with DOMXPath::registerPhpFunctions

Whipping up something quick without to much error handling:

functionxpathregexmatch($nodelist,$regex){
        foreach($nodelistas$node){
                if( $nodeinstanceof DOMText && preg_match($regex,$node->nodeValue)) returntrue;
        }
        returnfalse;
}

foreach(glob('*.xml') as$file){
        $d = new DOMDocument();
        $d->load($file);
        $x = new DOMXPath($d);
        $x->registerNamespace("php", "http://php.net/xpath");
        $x->registerPHPFunctions('xpathregexmatch');
        $matches = $x->query('//*[php:function("xpathregexmatch",text(),"/house|home/")]');
        if($matches->length){
                foreach($matchesas$node){
                        echo$file. ':'.$node->getNodePath().PHP_EOL;
                }
        }
}

Solution 2:

In PHP: glob the XML files, xpath all nodes, preg_match_all their text and if matches, get the nodes' xpath with getNodePath() and output it:

$pattern = '/home|house|guide/iu';

foreach (glob('data/*.xml') as $file)
{
    foreach (simplexml_load_file($file)->xpath('//*') as $node)
    {
        if (!preg_match_all($pattern, $node, $matches)) continue;

        printf(
            "\"%s\" in %s, xpath: %s\n", implode('", "', $matches[0]),
            basename($file), dom_import_simplexml($node)->getNodePath()
        );
    }
}

Result (exemplary):

"Guide"in iana-charsets-2013-03-05.xml, xpath: /*/*[7]/*[158]/*[4]
"Guide"in iana-charsets-2013-03-05.xml, xpath: /*/*[7]/*[224]/*[2]
"Guide"in iana-charsets-2013-03-05.xml, xpath: /*/*[7]/*[224]/*[4]
"guide"in rdf-dmoz.xml, xpath: /*/*[4]/d:Description"guide"in rdf-dmoz.xml, xpath: /*/*[5]/d:Description

Nice question btw.

Solution 3:

php simplexml:

$xml=simplexml_load_string("file1.xml");
foreach ($xml->cars->car[2] as$car) {
    // do sth with $car
}

For more, be more specific with your question, please.

Post a Comment for "Get Xpath From Search Result Of A Specific Regex Pattern In A Bunch Of Xml Files"