Get Xpath From Search Result Of A Specific Regex Pattern In A Bunch Of Xml Files
I have many XML files, and i have to search in these files a string (in detail that will be a not-too-complicated regex). With the results i want to get the xpath of the node in wh
Solution 1:
Search:
//*[contains('home') or contains('house')]
In PHP:
Use DOMDocument & DOMXPath, and then just call DOMNode::getNodePath()
on the resulting matches.
If you actually need a regex instead of those matches earlier, php's DOMDocument only has XPATH 1.0 functions, but you can add functionality to DOMXPath by adding a user defined function with DOMXPath::registerPhpFunctions
Whipping up something quick without to much error handling:
functionxpathregexmatch($nodelist,$regex){
foreach($nodelistas$node){
if( $nodeinstanceof DOMText && preg_match($regex,$node->nodeValue)) returntrue;
}
returnfalse;
}
foreach(glob('*.xml') as$file){
$d = new DOMDocument();
$d->load($file);
$x = new DOMXPath($d);
$x->registerNamespace("php", "http://php.net/xpath");
$x->registerPHPFunctions('xpathregexmatch');
$matches = $x->query('//*[php:function("xpathregexmatch",text(),"/house|home/")]');
if($matches->length){
foreach($matchesas$node){
echo$file. ':'.$node->getNodePath().PHP_EOL;
}
}
}
Solution 2:
In PHP: glob
the XML files, xpath
all nodes, preg_match_all
their text and if matches, get the nodes' xpath with getNodePath()
and output it:
$pattern = '/home|house|guide/iu';
foreach (glob('data/*.xml') as $file)
{
foreach (simplexml_load_file($file)->xpath('//*') as $node)
{
if (!preg_match_all($pattern, $node, $matches)) continue;
printf(
"\"%s\" in %s, xpath: %s\n", implode('", "', $matches[0]),
basename($file), dom_import_simplexml($node)->getNodePath()
);
}
}
Result (exemplary):
"Guide"in iana-charsets-2013-03-05.xml, xpath: /*/*[7]/*[158]/*[4]
"Guide"in iana-charsets-2013-03-05.xml, xpath: /*/*[7]/*[224]/*[2]
"Guide"in iana-charsets-2013-03-05.xml, xpath: /*/*[7]/*[224]/*[4]
"guide"in rdf-dmoz.xml, xpath: /*/*[4]/d:Description"guide"in rdf-dmoz.xml, xpath: /*/*[5]/d:Description
Nice question btw.
Solution 3:
php simplexml:
$xml=simplexml_load_string("file1.xml");
foreach ($xml->cars->car[2] as$car) {
// do sth with $car
}
For more, be more specific with your question, please.
Post a Comment for "Get Xpath From Search Result Of A Specific Regex Pattern In A Bunch Of Xml Files"