Skip to content Skip to sidebar Skip to footer
Showing posts with the label Apache Spark Sql

Pyspark 2.1: Importing Module With Udf's Breaks Hive Connectivity

I'm currently working with Spark 2.1 and have a main script that calls a helper module that con… Read more Pyspark 2.1: Importing Module With Udf's Breaks Hive Connectivity

Spark - Set Null When Column Not Exist In Dataframe

I'm loading many versions of JSON files to spark DataFrame. some of the files holds columns A,B… Read more Spark - Set Null When Column Not Exist In Dataframe

Convert A Pandas Dataframe To A Pyspark Dataframe

I have a script with the below setup. I am using: 1) Spark dataframes to pull data in 2) Converting… Read more Convert A Pandas Dataframe To A Pyspark Dataframe

How To Use Scala Udf In Pyspark?

I want to be able to use a Scala function as a UDF in PySpark package com.test object ScalaPySpark… Read more How To Use Scala Udf In Pyspark?

Error In Labelled Point Object Pyspark

I am writing a function which takes a RDD as input splits the comma separated values then convert… Read more Error In Labelled Point Object Pyspark

Apply Udf To Multiple Columns And Use Numpy Operations

I have a dataframe named result in pyspark and I want to apply a udf to create a new column as belo… Read more Apply Udf To Multiple Columns And Use Numpy Operations

Pyspark: How To Deal With Null Values In Python User Defined Functions

I want to use some string similarity functions that are not native to pyspark such as the jaro and … Read more Pyspark: How To Deal With Null Values In Python User Defined Functions

Implementing A Recursive Algorithm In Pyspark To Find Pairings Within A Dataframe

I have a spark dataframe (prof_student_df) that lists student/professor pair for a timestamp. There… Read more Implementing A Recursive Algorithm In Pyspark To Find Pairings Within A Dataframe