Thursday, June 10, 2021

AWS Glue中PySpark和Spark SQL

 Glue封装了PySpark和Spark SQL


PySpark Select columns


DataSource0.count()

DataSource0.printSchema()

df = DataSource0.toDF()

找到value column中含数字字母的

df.filter(df['value'].rlike('\w+')).show()

找到value column中只含数字字母的

df.filter(df['value'].rlike('^a-zA-Z\d\s:') == False).show()



No comments:

Post a Comment