Import of large CSV datasets in Java  
Author Message
ps_chowdary





PostPosted: 2003-8-12 0:10:00 Top

java-programmer, Import of large CSV datasets in Java Like many other enterprise applications, in our application we need to
export and import of large data in CSV format. As many people
suggested in the newsgroups we may use native tools such as SQL*Loader
for performance reasons.

But in our application we have internally generated sequence number as
PK. So we will not have luxury of using these tools directly.

For example

Student table
student_id PK Number(38,0)
student_tag UK Varchar2(40)
..

Course table
course_id PK Number(38,0)
course_tag UK Varchar2(4)
..

Student_Course table
student_id
course_id
..

If we need to import student table, then we may have only student_tag
and other info and we need to find student_id (using
seq_student.nextval or internally maintain seq number and refresh the
count at the end of the import)

If we need to import student_course table, we need to find student_id
and course_id's corresponding to student_tag and course_tag given in
the CSV file of student_course table. This is where the performance
problem lies.

Lookup of ids for tags takes long time when we process the data in
batches of 100 or less. One idea could be first fill in all the IDs in
the tmp_student_course.csv file (copy of student_couese.csv but
without TAGs) and then import using sql*Loader.

Can any one suggest alternative design ideas or third party APIs to
accomplish this? Any pointers will be greatly appreciated.

thanks,
Srinivas