Spark: How To Transform To Data Frame Data From Multiple Nested Xml Files With Attributes
How to transform values below from multiple XML files to spark data frame : attribute Id0 from Level_0 Date/Value from Level_4 Required output: +----------------+-------------+--
Solution 1:
You can use Level_0
as the rowTag, and explode the relevant arrays/structs:
import pyspark.sql.functions as F
df = spark.read.format('xml').options(rowTag="Level_0").load('line_removed.xml')
df2 = df.select(
'_Id0',
F.explode_outer('Level_1.Level_2.Level_3.Level_4').alias('Level_4')
).select(
'_Id0',
'Level_4.*'
)
df2.show()
+---------------+----------+-----+
| _Id0| Date|Value|
+---------------+----------+-----+
|Id0_value_file1|2021-01-01| 4_1|
|Id0_value_file1|2021-01-02| 4_2|
+---------------+----------+-----+
Post a Comment for "Spark: How To Transform To Data Frame Data From Multiple Nested Xml Files With Attributes"