[ad_1]
I'm reading from a json file using pyspark as follows:
raw = sc.textFile(path)
dataset_df = sqlContext.read.json(raw)
So to select only specific keys from the json file (if the key is present), I use:
dataset_df.select('countryName', 'city', 'age')
However, I get the following error from running the line above:
"cannot resolve '
countryName
' given input columns: [countryName', 'city', "age"]n"
I get a similar error when I remove countryName
from the list of keys to read from the csv. I have tested on other keys from the json file, for some, the code above runs without issues but for specific columns I get the error shown above.
Does anyone know what could be the reason behind this?
Thanks in advance.
[ad_2]
لینک منبع