دنبال کننده ها

۱۳۹۷ فروردین ۹, پنجشنبه

python - Pyspark "cannot resolve '`countryName`' given input columns: [countryName, city, age]

[ad_1]



I'm reading from a json file using pyspark as follows:



raw = sc.textFile(path)
dataset_df = sqlContext.read.json(raw)


So to select only specific keys from the json file (if the key is present), I use:



dataset_df.select('countryName', 'city', 'age')


However, I get the following error from running the line above:




"cannot resolve 'countryName' given input columns: [countryName', 'city', "age"]n"




I get a similar error when I remove countryName from the list of keys to read from the csv. I have tested on other keys from the json file, for some, the code above runs without issues but for specific columns I get the error shown above.



Does anyone know what could be the reason behind this?



Thanks in advance.




[ad_2]

لینک منبع