[ad_1]
I am currently trying to insert roughly a billion rows of data in a MySQL table. I am pulling my data from a directory of .JSON files where each .JSON file contains ~200K rows. There are 5K files total.
What I am currently doing, is going through each file and creating a tuple which contains the rows I want to insert. I am putting this tuple in a list and after I get through the whole JSON file, I insert the list of rows in MySQL. This is faster than inserting each row into SQL at a time, but this is still going to take me over 3 days and I don't have the time to spare.
I initially created a list that contained 200,000,000 rows each (which was fast to generate) but it took too long to insert in MySQL. That is why I am not only inserting every 200,000 rows. Does anyone have any advice on how to speed this up?
path = *path to my file*
for filename in glob.glob(os.path.join(path, '*.JSON')):
myList = []
with open(filename) as json_data:
j = json.load(json_data)
for i in j["rows"]:
name = i["values"][0][0]
age = i["values"][0][1]
gender = i["values"][0][2]
data = (**None**,name,age,gender)
myList.append(data)
cursor = conn.cursor()
q = """INSERT INTO nordic_data values (%s,%s,%s,%s)"""
cursor.executemany(q, myList)
conn.commit()
[ad_2]
لینک منبع